156 15 17MB
English Pages 550 [524] Year 2023
Smart Innovation, Systems and Technologies 361
Jyoti Choudrie Parikshit N. Mahalle Thinagaran Perumal Amit Joshi Editors
ICT for Intelligent Systems Proceedings of ICTIS 2023
Smart Innovation, Systems and Technologies Volume 361
Series Editors Robert J. Howlett, KES International Research, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.
Jyoti Choudrie · Parikshit N. Mahalle · Thinagaran Perumal · Amit Joshi Editors
ICT for Intelligent Systems Proceedings of ICTIS 2023
Editors Jyoti Choudrie Hertfordshire Business School University of Hertfordshire Hatfield, Hertfordshire, UK Thinagaran Perumal University Putra Malaysia Serdang, Selangor, Malaysia
Parikshit N. Mahalle Department of Artificial Intelligence and Data Science Vishwakarma Institute of Information Technology Pune, Maharashtra, India Amit Joshi Global Knowledge Research Foundation Ahmedabad, Gujarat, India
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-99-4039-4 ISBN 978-981-99-3982-4 (eBook) https://doi.org/10.1007/978-981-99-3982-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface ICTIS 2023
Seventh International Conference on Information and Communication Technology for Intelligent Systems (ICTIS 2023) targets state-of-the-art as well as emerging topics pertaining to information and communication technologies (ICTs) and effective strategies for its implementation for engineering and intelligent applications. The conference is anticipated to attract a large number of high-quality submissions, stimulate the cutting-edge research discussions among many academic pioneering researchers, scientists, industrial engineers, students from all around the world and provide a forum to researcher; propose new technologies, share their experiences and discuss future solutions for design infrastructure for ICT; provide a common platform for academic pioneering researchers, scientists, engineers and students to share their views and achievements; enrich technocrats and academicians by presenting their innovative and constructive ideas; and focus on innovative issues at international level by bringing together the experts from different countries. The conference was held during 27–28 April 2023, physically on 27 April 2023 at Hotel Pride Plaza, Bodakdev, Ahmedabad, and digitally on 28 April 2023 Platform: Zoom and organized and managed by Global Knowledge Research Foundation and GR Scholastic LLP in collaboration with Knowledge Chamber of Commerce And Industry. Research submissions in various advanced technology areas were received, and after a rigorous peer-reviewed process with the help of programme committee members and external reviewer, 160 papers were accepted with an acceptance rate of 17%. All 160 papers of the conference are accommodated in three volumes; also, papers in the book comprise authors from five countries. This event success was possible only with the help and support of our team and organizations. With immense pleasure and honour, we would like to express our sincere thanks to the authors for their remarkable contributions, all the technical programme committee members for their time and expertise in reviewing the papers within a very tight schedule and the publisher Springer for their professional help. We are overwhelmed by our distinguished scholars and appreciate them for accepting our invitation to join us through the virtual platform and deliver keynote speeches and technical session chairs for analysing the research work presented by v
vi
Preface ICTIS 2023
the researchers. Most importantly, we are also grateful to our local support team for their hard work for the conference. Hatfield, UK Pune, India Serdang, Malaysia Ahmedabad, India
Jyoti Choudrie Parikshit N. Mahalle Thinagaran Perumal Amit Joshi
Contents
Intrusion Detection Model for IoT Networks Using Graph Convolution Networks(GCN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H. S. Manjula, M. S. Roopa, J. S. Arunalatha, and K. R. Venugopal Drowsiness Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dhiren P. Bhagat, Bhavyesh Prajapati, Krutarth Pawar, Darshan Parekh, and Param Gandhi A Deep Learning Technique to Recommend Music Based on Facial and Speech Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Pallavi Reddy, B. Abhinaya, and Athkuri Sahithi
1 13
25
Smart Chair Posture Detection and Correction Using IOT . . . . . . . . . . . . . H. S. Shreyas, G. Satwika, P. Manjunath, M. Shiva, and M. Ananda
41
The Opinions Imparted on Singular’s Face . . . . . . . . . . . . . . . . . . . . . . . . . . K. Pramilarani, K. Ashok, Srinivas Pujala, Hemanth Karnakanti, and M. K. Vinaya
53
Abnormal Human Behavior Detection from a Video Sequence Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muskan Sahetai, Bansari Patel, Radhika Patel, Ritika Jani, and Dweepna Garg
65
Role of Deep Learning in a Secure Telemedicine System with a Case Study of Heart Disease Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Darshan Singh, Siddhant Thapliyal, Mohammad Wazid, and D. P. Singh
77
Comparative Analysis of Chronic Kidney Disease Prediction Using Supervised Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . K. Poorani and M. Karuppasamy
87
Prediction of PCOS and PCOD in Women Using ML Algorithms . . . . . . M. J. Lakshmi, D. S. Spandana, Harini Raj, G. Niharika, Ashwini Kodipalli, Shoaib Kamal, and Trupthi Rao
97
vii
viii
Contents
Privacy Preserving Early Disease Diagnosis in Human Nails Using Swarm Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Aasim Mohammed, P. S. Shrikanth Karthik, Razik Fatin Shariff, Tankala Sunaina, Arti Arya, and Pooja Agarwal Skin Cancer Recognition Using CNN, VGG16 and VGG19 . . . . . . . . . . . . 131 Yashwant S. Ingle and Nuzhat Shaikh Diagnosis of Cardiovascular Disease Using Machine Learning Algorithms and Feature Selection Method for Class Imbalance Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Ritika Kumari, Jaspreeti Singh, and Anjana Gosain Similarity Based Answer Evaluation in Academic Questions Using Natural Language Processing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 S. Santhiya, S. Elavarasan, S. Gandhikumar, and M. P. Gowsik Fake News Detection Using Machine Learning and Deep Learning Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C. Nandhakumar, C. Kowsika, R. Reshema, and L. Sandhiya Survey on Pre-Owned Car Price Prediction Using Random Forest Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 C. Selvarathi, G. Bhava Dharani, and R. Pavithra Sentiment Analysis of Youtube Comment Section in Indian News Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Samank Gupta and S. Kirthica Deep Learning Framework for Speaker Verification Under Multi Sensor, Multi Lingual and Multi Session Conditions . . . . . . . . . . . . . . . . . . 201 Pratham Sanshi, Likhith Reddy Kuruvalli, Satish Chikkamath, and R. S. Nirmala DLLACC: Design of an Efficient Deep Learning Model for Identification of Lung Air Capacity in COPD Affected Patients . . . . . 213 Sruthi Nair Content Based Document Image Retrieval Using Computer Vision and AI Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Harsh Bharat Shah and Jyoti Vishnu Joglekar Monitor the Effectiveness of Cardiovascular Disease Illness Diagnostics Utilizing AI and Supervised Machine Learning Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Dushyantsinh B. Rathod, Yesha Patel, Archana Jethava, Namrata Gohel, Dhruvi Suthar, Dhaval Varia, Nirav Shah, and Janki Barot
Contents
ix
Architecture Based Classification for Intrusion Detection System Using Artificial Intelligence and Machine Learning Classifiers . . . . . . . . . 249 Archana Gondalia and Apurva Shah A Novel Privacy-Centric Training Routine for Maintaining Accuracy in Traditional Machine Learning Systems . . . . . . . . . . . . . . . . . . 257 Hrishikesh K. Haritas, Chinmay K. Haritas, and Jagadish S. Kallimani Outside the Closed World: On Using Machine Learning for Network Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Sneha Padhiar and Ritesh Patel Data Collection for a Machine Learning Model to Suggest Gujarati Recipes to Cardiac Patients Using Gujarati Food and Fruit with Nutritive Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Nirav Mehta and Hetal Thaker Plant and Weed Seedlings Classification Using Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 G. Bharathi, Sk. Farheen, Sk. Ashrita Parvin, U. Rajarajeswari, and Y. Nikhila A Comprehensive Review on Various Artificial Intelligence Based Techniques and Approaches for Cyber Security . . . . . . . . . . . . . . . . . . . . . . 303 V. Kanchana Devi, S. Asha, E. Umamaheswari, and Nebojsa Bacanin Applicability of Machine Learning for Personalized Medicine . . . . . . . . . 315 Rupa Fadnavis and Manali Kshirsagar I-LAA: An Education Chabot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 P. Pavan Kumar, Vangaveti Likhita, and Uddaraju Sai Pranav A Comparison of Machine Learning Approaches for Forecasting Heart Disease with PCA Dimensionality Reduction . . . . . . . . . . . . . . . . . . . 333 Shilpa Sharma, Mandeep Kaur, and Savita Gupta Comparative Study of a Computer Vision Technique for Locating Instances of Objects in Images Using YOLO Versions: A Review . . . . . . . 349 Prajkta P. Khaire, Ramesh D. Shelke, Dilendra Hiran, and Mahendra Patil Remotely Accessed Smart CCTV System Using Machine Learning . . . . . 361 S. B. Pokle, Apurva Thote, Janhvi Dahake, Kanishka Pawde, and Maahi Kanojia Enhancing Surveillance and Face Recognition with YOLO-Based Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Saraswati Patil, Dhammadeep Meshram, Mustafa Bohra, Mustansir Daulat, Akshita Manwatkar, and Ashutosh Gore
x
Contents
Heart Disease Prediction Using Supervised Learning . . . . . . . . . . . . . . . . . 385 Saraswati Patil, Pavan Kumar Sanjay, Harsh Pardeshi, Niraj Patil, Omkar Pawar, and Prishita Jhamtani A Review of Machine Learning Tools and Techniques for Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Vishwanath D. Chavan and Pratibha S. Yalagi Model for Effective Project Implementation for Undergraduate Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Pratibha S. Yalagi, Vishwanath D. Chavan, and Dattatray P. Gandhamal Navigating the Aisles: An Augmented Reality Solution for Gamified Indoor Grocery Store Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Rakshita Danee, Janhavi Mhatre, Yash Raje, Simran Huddedar, and Vidya Pujari Design of Sustainable Water Resource Management System for Agriculture Using IOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Bhavna Rathore, Abhisekha Gautam, Ritesh Kumar, Mannepali Chakradhar, Saurabh Kumar Singh, and Rosepreet Kaur Bhogal IoT Cloud Convergence Use Cases: Opportunities, Challenges—Comprehensive Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 D. D. Sapkal, R. V. Patil, Parikshit N. Mahalle, and Satish G. Kamble Analysis of Genomic Selection Methodology in Wheat Using Machine Learning and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Vaidehi Sinha and Sharmishta Desai Exploring Machine Learning and Deep Learning Techniques for Potato Disease Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Nishant Kumar, Purshottam Kumar, Prince Sharma, and Rahul Katarya Intelligent Process Automation for Indian Car Sales Forecasting Using Machine Learning Time Series Algorithms . . . . . . . . . . . . . . . . . . . . . 469 Deep Shahane, Samiksha Pansare, Riya Ingale, Rutvik Narkar, and Amit Nerurkar Generation of Historical Artwork Using GAN . . . . . . . . . . . . . . . . . . . . . . . . 485 A. Soumya, Karthik S. Rao, and Sumalatha Aradhya Wheat, Rice and Corn Yield Prediction for Jammu District Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Sakshi Gandotra, Rita Chhikara, and Anuradha Dhull
Contents
xi
Detection of UDP SYN Flood DDoS Attack Using Random Forest Machine Learning Algorithm in a Simulated Software Defined Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 V. Mohan, B. K. Madhavi, and S. B. Kishor Capability Based Access Control Mechanism in IoT: a Survey of State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Vishal Ambhore, Sandhya Shevatre, Rushikesh Ambhore, Ketki Kshirsagar, and Parikshit N. Mahalle
About the Editors
Prof. Jyoti Choudrie is Professor of Information Systems in Hertfordshire Business School, Management, Leadership and Organisation (MLO) department where she previously led the Systems Management Research Unit (SyMRU) and currently is a convenor for the Global Work, Economy and Digital Inclusion group. She is also Editor-in-Chief for Information, Technology and People journal (An Association of Business School 3 grade journal). In terms of research, Prof. Choudrie is known as the Broadband and Digital Inclusion expert in University of Hertfordshire, which was also the case in Brunel University. To ensure her research is widely disseminated, Prof. Choudrie co-edited a Routledge research monograph with Prof. C. Middleton: The Management of Broadband Technology Innovation and completed a research monograph published by Routledge Publishing and focused on social inclusion along with Dr. Sherah Kurnia and Dr. Panayiota Tsatsou titled: Social Inclusion and Usability of ICT-Enabled Services. She also works with Age (UK) Hertfordshire, Hertfordshire County Council and Southend YMCA where she is undertaking a Knowledge Transfer Partnership project investigating the role of Online Social Networks (OSN). Finally, she is focused on artificial intelligence (AI) applications in organizations and society alike, which accounts for her interests in OSN, machine and deep learning. She has been a keynote speaker for the International Congress of Information and Communication Technologies, Digital Britain conferences and supervises doctoral students drawn from around the globe. Presently, she is seeking 3–4 doctoral students who would want to research AI in society and organizations alike. Dr. Parikshit N. Mahalle is Senior Member IEEE and is Professor, Dean Research and Development and Head—Department of Artificial Intelligence and Data Science at Vishwakarma Institute of Information Technology, Pune, India. He completed his Ph.D. from Aalborg University, Denmark, and continued as Post-Doctoral Researcher at CMI, Copenhagen, Denmark. He has 23 + years of teaching and research experience. He is a member of the Board of Studies in Computer Engineering, Ex-Chairman Information Technology, Savitribai Phule Pune University and various universities and autonomous colleges across India. He has 12 patents, 200+ xiii
xiv
About the Editors
research publications (Google Scholar citations-2750 plus, H index-25 and Scopus Citations are 1400 plus with H index-17, Web of Science citations are 438 with H index-10) and authored/edited 50+ books with Springer, CRC Press, Cambridge University Press, etc. He is an editor-in-chief for IGI Global—International Journal of Rough Sets and Data Analysis, Inter-science International Journal of Grid and Utility Computing, member-Editorial Review Board for IGI Global—International Journal of Ambient Computing and Intelligence and reviewer for various journals and conferences of the repute. His research interests are machine learning, data science, algorithms, Internet of things, identity management and security. He is guiding eight Ph.D. students in the area of IoT and machine learning, and recently, five students have successfully defended their Ph.D. under his supervision from SPPU. He is also the recipient of “Best Faculty Award” by Sinhgad Institutes and Cognizant Technologies Solutions. He has delivered 200 plus lectures at national and international levels. Dr. Thinagaran Perumal received his B.Eng. in Computer and Communication System Engineering from Universiti Putra Malaysia in 2003. He completed his M.SC. and Ph.D. Smart Technologies and Robotics from the same university in 2006 and 2011, respectively. Currently, he is appointed as Senior Lecturer at the Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia. He is also currently appointed as Head of CyberPhysical Systems in the university and also been elected as Chair of IEEE Consumer Electronics Society Malaysia Chapter. Dr. Thinagaran Perumal is the recipient of 2014 IEEE Early Career Award from IEEE Consumer Electronics Society for his pioneering contribution in the field of consumer electronics. His research interests are towards interoperability aspects of smart homes and Internet of things (IoT), wearable computing and cyber-physical systems. His recent research activities include proactive architecture for IoT systems, development of the cognitive IoT frameworks for smart homes and wearable devices for rehabilitation purposes. He is an active member of IEEE Consumer Electronics Society and its Future Directions Committee on Internet of things. He has been invited to give several keynote lectures and plenary talk on Internet of things in various institutions and organizations internationally. Dr. Amit Joshi is currently Director of Global Knowledge Research Foundation, also an entrepreneur and researcher who has completed his graduation (B.Tech.) in Information Technology and M.Tech. in Computer Science and Engineering and completed his research in the areas of cloud computing and cryptography in medical imaging with a focus on analysis of the current government strategies and world forums needs in different sectors on security purposes. He has an experience of around ten years in academic and industry in prestigious organizations. He is an active member of ACM, IEEE, CSI, AMIE, IACSIT-Singapore, IDES, ACEEE, NPA and many other professional societies. Further currently, he is also the International Chair of InterYIT at International Federation of Information Processing (IFIP, Austria). He has presented and published more than 50 papers in National and International Journals/Conferences of IEEE and ACM. He has also edited more than 20
About the Editors
xv
books which are published by Springer, ACM and other reputed publishers. He has also organized more than 40 national and international conferences and workshops through ACM, Springer, IEEE across five countries including India, UK, Thailand and Europe.
Intrusion Detection Model for IoT Networks Using Graph Convolution Networks(GCN) H. S. Manjula, M. S. Roopa, J. S. Arunalatha, and K. R. Venugopal
Abstract Internet of Things (IoT) is a prominent field that plays a crucial role for providing communication, sensing, and transmission services with the help of the Internet. To provide efficient services and functioning of IoT networks, the IETF (Internet Engineering Task Force) has defined RPL (Routing Protocol for Low-Power and Lossy Networks). The RPL is a standard routing standard developed to allow machines in the IoT network to sense, communicate and transfer data with constraints like processing, energy, and less memory. Untrusted users access the RPL-based IoT communication networks through the Internet; hence IoT nodes are exposed to routing attacks. Routing security threats are challenging to detect in IoT communications networks. A blackhole attack is a security threat in RPL-based IoT communication networks where an intruded device intentionally misleads other nodes into sending their data to it, causing the data to be lost. Hence, Intrusion Detection System (IDS) is needed to recognize routing attacks in IoT networks. In this work, we present Graph Convolution Network (GCN) based IDS to identify blackhole attacks in IoT networks. It detects blackhole attacks by capturing and processing the behaviours of nodes in the network. Experimental results show that the implemented intrusion detection model effectively detects blackhole attacks in IoT communication network topology. This method can enrich the security of IoT communication networks and protect against malicious attacks that could compromise the data being exchanged between nodes.
H. S. Manjula (B) · J. S. Arunalatha · K. R. Venugopal Department of CSE, UVCE, Bangalore University, Bengaluru, India e-mail: [email protected] K. R. Venugopal e-mail: [email protected] M. S. Roopa Department of CSE, Dayananda Sagar College of Engineering, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Choudrie et al. (eds.), ICT for Intelligent Systems, Smart Innovation, Systems and Technologies 361, https://doi.org/10.1007/978-981-99-3982-4_1
1
2
H. S. Manjula et al.
1 Introduction The term IoT represents to the network of buildings, machines, vehicles, and other objects connected to the Internet with sensors, software, and network connectivity. The machines in IoT networks are small as a smart thermostat or as complex as an autonomous vehicle. Without human intervention, they can exchange and communicate information via the Internet [1]. The IoT can change various industries, including healthcare, transportation, agriculture, and manufacturing, by enabling real-time data gathering, analysis, and decision making. However, the increasing reliance on IoT devices also raises concerns about security and privacy, as these devices may be vulnerable to cyber-attacks and data breaches. RPL is an extensively utilized routing protocol explicitly developed for less power, low-bandwidth, and lossy networks to provide routing services for IoT networks. It is intended to operate in resource-constrained environments where nodes have confined memory, computing power, and battery life [2]. For nodes with limited resources in IoT networks, RPL reliably and effectively controls data routing and provides bidirectional data communication. RPL is based on a distance-vector routing algorithm that creates a Directed Acyclic Graph (DAG) to establish and maintain routes between devices. It supports multiple modes of operation, including mesh and star topologies, and can be configured to optimize for various performance metrics such as energy efficiency, reliability, and latency. RPL is a critical component of low-Power wireless personal area networks, a standard protocol technique for IoT networks, and is widely utilized in various IoT application areas like smart buildings, farming, smart factories, and cities. There are several types of attacks that can detected on RPL-based IoT networks as illustrated in Fig. 1. 1. Blackhole attacks: In this attack, an intruder deliberately discards all incoming network packets, effectively creating a “blackhole” that prevents the delivery of data. It can disrupt the operation of the network and prevent devices from communicating with each other[3]. 2. Wormhole attacks: A wormhole attack involves creating a tunnel or shortcut between two remote locations in the network, allowing an attacker to bypass security measures and intercept or manipulate data packets [4]. 3. Sybil attacks: In this type of attack, an attacker creates numerous fake identities or Sybil nodes and uses them to manipulate the routing decisions of legitimate nodes in the network [5]. 4. Routing table overflow attacks: An attacker can flood the routing table of a legitimate node with fake entries, causing it to crash or become unresponsive. 5. Denial-of-Service (DoS) attacks: A DoS is a security threat that involves overwhelming a particular node or the entire network with traffic, making it unavailable to legitimate users [6]. 6. Selective forwarding attack: In this attack, an intruder discards important data and may send non-important data from neighbours [7].
Intrusion Detection
3
Fig. 1 Various attacks on RPL-based IoT networks
7. RPL spoofing attacks: An attacker can create fake RPL control messages and inject them into the network, causing legitimate nodes to make incorrect routing decisions [8]. These attacks can have serious consequences for the operation and security of RPL-based IoT networks, and it is crucial to implement appropriate measures to recognize and prevent them. The contributions of this proposed work are presented as follows: I. We have developed an intrusion detection framework based on GCN to detect Blackhole attacks. II. We applied the GCN model on RADAR(Routing attacks Dataset for RPL) for detecting intrusions. III. Comparison of the results is demonstrated by evaluating the proposed intrusion detection system. The following Sect. 2 outlines existing intrusion detection methods, Sect. 3 presents relevant information about system model, Sect. 4 describes implementation. Section 5 edicts the experimental findings, while Sect. 5 gives conclusions of the proposed intrusion detection mechanism.
2 Literature Survey In this segment, we review the various intrusion detection methods for identifying routing attacks in IoT networks.
4
H. S. Manjula et al.
Rabhi et al. [9] have presented machine learning intrusion detection techniques to identify three routing security threats in IoT networks. It detects attacks from network traffic data using machine learning classifiers and demonstrates the performance of multiple classifiers by calculating F-measure, recall, and precision evaluation metrics. However, the proposed work failed to address other routing attacks. Verma et al. [10] have implemented a network intrusion detection model to address routing security threats in RPL-based IoT communication networks. Various machine learning classifiers are operated on the dataset to detect attacks. Authors have applied ensemble learning to increase the accuracy. However, it did not emphasize implementing a network intrusion detection model on intelligent nodes. Virendra et al. [11] have implemented an IDS to address blackhole attacks. Experimental results depicts that the proposed trust-based methodology improves the network’s security and performance. The system discovers some future issues to investigate, such as the need for specific standardized intrusion detection methods and the demand for existing approaches to be enhanced. The network parameters like energy consumption, communication overhead, and computation time for evaluating the model were not considered. Parra et al. [12] have introduced an intrusion detection method based on deep learning to identify routing attacks. NBaIoT dataset was utilized to asses the deep learning technique. Experimental findings depicts that the developed framework performs well in detecting attacks. However this method requires more training time and computation time. Patel et al. [13] have introduced filter-based techniques to recognize blackhole attacks in RPL IoT networks. The proposed work performs well compared to existing intrusion detection techniques in terms of detection rate. Vikram et al. [14] have presented intrusion detection and prevention method to identify blackhole attacks from IoT network traffic data. This work performs well in detecting blackhole attacks in detection rate, packet delivery rate, and network delay. Syeda et al. [15] have suggested trust based methodlogy to identify routing attacks by considering the mobility of sensors in IoT networks. The proposed method achieved better accuracy by considering the mobility of sensor nodes as compared to existing techniques. Philokypros et al. [16] have presented machine learning based security mechanism to detect rank and blackhole threats in IoT applications. Google AutoML and Azure cloud based machine learning frameworks are used to conduct experiments. This work performs well in identifying rank and blackhole attacks in terms of accuracy, precision, and recall. Eric et al. [17] have introduced Heartbeat-Based detection technique to recognize routing attacks. This proposed technique was introduced to detect greyhole and blackhole attacks in IoT communication networks. However, in the future this presented intrusion detection model can be used to address other types of routing attacks. Philokypros et al. [18] have implemented an intrusion detection framework to detect blackhole and rank attacks from IoT network traffic data. The framework includes two methods namely trust based security framework and external IDS to identify attacks. However, this intrusion detection framework failed to detect more routing attacks. The machine and deep learning algorithms can be applied to address other routing attacks. Choukri et al. [19] have developed an intrusion detection method to
Intrusion Detection
5
recognize blackhole attacks in unsecured IoT network topology. It detects attacks by analyzing the network traffic, considering the features, and training the deep learning framework. The proposed intrusion detection framework performs well compared to existing machine learning models in terms of error rate and detection accuracy.
3 System Model In this section, we discuss the system for intrusion detection in IoT networks by utilizing Graph Convolution Networks(GCN). Figure 2 depicts the overall process of the proposed system.
3.1 Dataset We have used the dataset of [20] in the proposed system to recognize blackhole attacks. The RADAR dataset was simulated using the Netsim tool and includes data from 16 static IoT nodes and a single border router node that constructs a single Destination Oriented Directed Acyclic Graph-(DODAG) structure. It comprises of both normal and attack traffic data, and is annotated with labels indicating the type of attack being performed. The dataset consists of 5 simulation files (in CSV format) per attack, for a total of 70 simulation files. Each simulation file represents a simulation lasting 1500 s, with the attack occurring between the 500th and 700th seconds. The dataset is intended to be representative of real-world RPL-based IoT networks, to serve as a standard dataset for testing the performance of attack detection algorithms.
3.2 Feature Extraction For each simulation of the chosen dataset, the following 11 features were extracted as shown in Table 1—Number of DODAG Information Object(DIO) received, DIO packets transmitted, Destination Advertisement Object (DAO) packets received, DAO packets transmitted, DODAG Information Solicitation (DIS) pcakets transmit-
Fig. 2 Proposed GCN-based intrusion detection system
6
H. S. Manjula et al.
ted, application packets received, application packets transmitted, Received versus Transmitted application rate, version, next hop IP, and rank [20].
3.3 Graph Construction The above mentioned features as shown in Table 1 are considered as node features along with three additional edge features; the number of DAO packets transmitted, DIO packets transmitted, and application packets transmitted. Using this information, the graphs are built for every 10 s of the simulation using GCNs. Each graph is labelled based on the attack time. For instance, if an attack begins at the 512th second, all graphs created after that point are labelled as under attack, while those created before that point are labelled legitimate. GCNs are used to investigate the communication patterns of the devices from the network and identify anomalies that may indicate the presence of a blackhole attack. Graph convolutional networks are a powerful deep learning technique that has captured widespread recognition in recent years due to their ability to analyze data represented graphs or networks. GCNs are a variant of Convolutional Neural Networks (CNNs), commonly used for image classification and other tasks involving structured data. However, unlike CNNs, which are designed to process data arrays or grids, GCNs are designed to process data represented as graphs, where nodes represent entities and edges define relationships between these entities. GCNs operate by propagating information across the graph structure through a series of convolutional layers, composed of filters applied to the nodes and edges of the graph. These filters are designed to extract and combine features from the edges and nodes in the graph, and can be learned from data using standard optimization algorithms. GCNs are used in many applications, including graph classification, link prediction, node classifica-
Table 1 Selected features for proposed GCN model Name of the feature S. No. 1 2 3 4 5 6 7 8 9 10 11
DIO packets received DIO packets transmitted DAO packets received DAO packets transmitted DIS packets transmitted Application packets received Application control packets transmitted Received versus transmitted application rate Rank Version number Next hop IP
Intrusion Detection
7
tion, and anomaly detection, and have shown impressive performance in many cases. One key advantage of GCNs is their ability to handle complex, non-Euclidean data structures such as graphs, which are common in many real-world applications. For example, GCNs have been used to analyze social networks, protein-protein interaction networks, and other types of data that cannot be easily represented in the form of grids or arrays. Additionally, GCNs can take into account the inherent structure and relationships within the data, which can be important for tasks such as predicting the likelihood of a link between two nodes in a network. Overall, GCNs are a valuable tool for analyzing and understanding complex, interrelated data structures, and they have the potential to revolutionize an extensive dimensions of applications in fields such as biology, social science and computer science.
3.4 Graph Classification In this phase, we represent all features from network traffic data as a graph, with devices represented as nodes and communication links described as edges. The GCN algorithm is applied to create a graph and to identify patterns and features indicative of blackhole attacks. Finally, the node and edge features are used to train a model that can classify the graph as either normal or attack, based on whether they are consistent with the trained patterns and features. In our work, a two-layer of GCNConv has been trained with a hidden size of 32. The result of the graph convolutional layers has been sent across a global average pooling layer, followed by a Dropout layer with a probability of 0.5, before being sent to the classification head for graph classification, which is a fully connected neural network with 32 neurons. The Adam optimizer was applied to train the proposed model at a learning rate of 0.01, and a batch size of 64 for 50 epochs.
4 Implementation We have implemented this work on GPU Tesla T4, having 16GB RAM. Python language has been used for implementation and Google Colab environment to train and test the model. PyTorch Geometric has been used for graph implementation. Training the model is an important part of intrusion detection system. In our proposed work, the model was trained on four of the five simulations provided in the dataset, while one was used for testing the model. We conducted the experiment by taking the training data and testing data of 80% and 20% respectively. The results of our work with the GCN model are illustrated in Table 2 by considering four evaluation parameters namely accuracy, F1-score, precision and recall to evaluate the detection performance [21, 22]. Figures 3, 4, 5 and 6 depicts comparison graphs for distinct batch size and recall, F-score, precision, and accuracy for the proposed GCN-based intrusion detection system demonstrated on the RADAR dataset.
8
H. S. Manjula et al.
Table 2 Performance of the proposed GCN based intrusion detection system Accuracy (%) F1-srore (%) Precision (%) Recall (%) Method name Proposed GCN model
98
98.27
100
96.59
Fig. 3 F1-score results at different batch size for GCN model Table 3 Comparison results of the developed GCN based intrusion detection system Detection accuracy (%) Method name Proposed GCN model DETONAR[20]
96.59 60
Table 3 depicts the corresponding results of our proposed model compared to the work carried out in [20].
Intrusion Detection
Fig. 4 Recall results at different batch size for GCN model
Fig. 5 Precision results at different batch size for GCN model
9
10
H. S. Manjula et al.
Fig. 6 Accuracy results at different batch size for GCN model
5 Conclusions With the rapid development of IoT networks, smart machines are connected and communicated through the Internet. All such connections in IoT communication networks are enabled by a routing protocol called RPL. This routing protocol uses different objective functions to find optimal paths for every node in the network topology. However, recent studies show that many cyber-attacks and topology attacks against RPL. It is imperative to design an intrusion identification method in RPLbased IoT networks to identify routing cyber-attacks. In this work, an IDS for IoT networks was developed based on GNNs. In this work, we have developed GCN model which detects blackhole attacks in IoT networks. The RADAR dataset was utilized for experimental evaluation in the proposed system. We have used GCN model to construct the graphs by taking the network traffic data, and the graphs are provided to GCNConv classifier for graph level predictions based on node and edge features. The experimental results show that our proposed work outperforms in detecting blackhole attacks compared to the work carried out in [20]. The model achieved an overall detection accuracy of 96.59%. In the future, we aim to extend this GCN model to address various routing attacks in IoT networks, to identify the nodes caused the attack in the network and explore other Graph techniques like GraphSAGE, and Graph attention networks, and compare their performance with Graph Convolution Networks.
Intrusion Detection
11
References 1. Santos L, Rabadao C, Goncalves R (2018) Intrusion detection systems in internet of things: a literature review. In: 2018 13th Iberian conference on information systems and technologies (CISTI). IEEE, pp 1–7 2. Almusaylim ZA, Alhumam A, Jhanjhi N (2020) Proposing a secure RPL based internet of things routing protocol: a review. Ad Hoc Netw 101:102096 3. Patel HB, Jinwala DC (2019) Blackhole detection in 6lowpan based internet of things: an anomaly based approach. In: TENCON 2019-2019 IEEE region 10 conference (TENCON). IEEE, pp 947–954 4. Jhanjhi N, Brohi SN, Malik NA et al (2019) Proposing a rank and wormhole attack detection framework using machine learning. In: 2019 13th international conference on mathematics, actuarial science, computer science and statistics (MACS). IEEE, pp 1–9 5. Murali S, Jamalipour A (2019) A lightweight intrusion detection for SYBIL attack under mobile RPL in the internet of things. IEEE Internet Things J 7(1):379–388 6. Liu J, Yang D, Lian M, Li M (2021) Research on intrusion detection based on particle swarm optimization in IoT. IEEE Access 9:38254–38268 7. Neerugatti V, Mohan Reddy AR (2019) Machine learning based technique for detection of rank attack in RPL based internet of things networks. Int J Innov Technol Explor Eng (IJITEE): 2278–3075 8. Chaabouni N, Mosbah M, Zemmari A, Sauvignac C, Faruki P (2019) Network intrusion detection for IoT security based on learning techniques. IEEE Commun Surv Tutor 21(3):2671–2701 9. Rabhi S, Abbes T, Zarai F (2022) IoT routing attacks detection using machine learning algorithms. Wirel Pers Commun: 1–19 10. Verma A, Ranga V (2019) 4th International conference on Internet of Things: smart innovation and usages (IoT-SIU). IEEE 2019:1–6 11. Dani V (2022) ibads: an improved black-hole attack detection system using trust based weighted method. J Inf Assur Secur 17(3) 12. Parra GDLT, Rad P, Choo K-KR, Beebe N (2020) Detecting internet of things attacks using distributed deep learning. J Netw Comput Appl 163:102662 13. Patel HB, Jinwala DC (2021) Trust and strainer based approach for mitigating blackhole attack in 6lowpan: a hybrid approach. IAENG Int J Comput Sci 48(4) 14. Neerugatti V, Reddy ARM, Rama A (2018) Detection and prevention of black hole attack in RPL protocol based on the threshold value of nodes in the internet of things networks. Int J Innov Technol Explor Eng 8(9) 15. Muzammal SM, Murugesan RK, Jhanjhi NZ, Jung LT (2020) Smtrust: proposing trust-based secure routing protocol for RPL attacks for IoT applications. In: 2020 international conference on computational intelligence (ICCI). IEEE, pp 305–310 16. Ioulianou PP, Vassilakis VG, Shahandashti SF (2022) Ml-based detection of rank and blackhole attacks in RPL networks. In: 13th international symposium on communication systems, networks and digital signal processing (CSNDSP). IEEE, pp 338–343 17. Ribera EG, Alvarez BM, Samuel C, Ioulianou PP, Vassilakis VG (2020) Heartbeat-based detection of blackhole and greyhole attacks in RPL networks. In: 12th international symposium on communication systems, networks and digital signal processing (CSNDSP). IEEE, pp 1–6 18. Ioulianou PP, Vassilakis VG, Shahandashti SF (2022) A trust-based intrusion detection system for RPL networks: detecting a combination of rank and blackhole attacks. J Cybersecur Privacy 2(1):124–153 19. Choukri W, Lamaazi H, Benamar N (2022) A novel deep learning-based framework for blackhole attack detection in unsecured RPL networks. In: 2022 international conference on innovation and intelligence for informatics, computing, and technologies (3ICT). IEEE, pp 457–462 20. Agiollo A, Conti M, Kaliyar P, Lin T-N, Pajola L (2021) Detonar: detection of routing attacks in RPL-based IoT. IEEE Trans Netw Serv Manage 18(2):1178–1190
12
H. S. Manjula et al.
21. Chaitra Y, Dinesh R, Gopalakrishna M, Prakash B (2022) Deep-CNNTL: text localization from natural scene images using deep convolution neural network with transfer learning. Arab J Sci Eng 47(8):9629–9640 22. Lokkondra CY, Ramegowda D, Thimmaiah GM, Prakash A, Vijaya B (2022) Defuse: deep fused end-to-end video text detection and recognition. Revue d’Intelligence Artificielle 36(3):459–466
Drowsiness Detection System Dhiren P. Bhagat, Bhavyesh Prajapati, Krutarth Pawar, Darshan Parekh, and Param Gandhi
Abstract The National Highway Traffic Safety Administration (NHTSA) reports state that over 100,000 accidents and more than 1,000 deaths per year are related to drivers’ drowsiness. The situation becomes prone to an accident when either the driver is sleepy or accelerating, or is not able to see the course ahead due to weather conditions. Many types of research have been done in this area and several are ongoing to prevent this from happening. This paper will be focused on that plus gleaning acceptable accurate results. For a brief outlook: the images captured by the camera will go through mathematical calculation and machine learning to check if the driver is drowsy or not. It can be used to construct a real-time drowsiness detection system. The model made should be lightweight, should not require more space, and should provide good accuracy in results.
1 Introduction Safety administration of our country has measured that about 3% of the accidents throughout the year are caused due to drowsiness. Almost 80,000 cases are attributed to impair driving, which is caused by delayed reaction time when under the influence. Also, accidents due to weather conditions are lower than driving under the influence. Seeing this, industries and agencies are trying to develop a system that will come together with built-in newer cars to see if the driver is drowsy or not by using a camera and sensors. There have been many technical advancements per se for assisting the driver while driving, some include checking the heart rate and the pulses constantly to see if it’s fluctuating a lot which usually happens at the time an incident is about to happen. Such techniques are not quite suitable for commercial uses as it does not measure it via some wireless system. It requires mounting of the device to one’s body, which may be uncomfortable. Some luxury cars provide the methods to glean the driver’s driving pattern and then seeing if there’s any uncertainty in it while driving, those techniques are more conducive to a limited number of users as such system D. P. Bhagat (B) · B. Prajapati · K. Pawar · D. Parekh · P. Gandhi Sarvajanik College of Engineering and Technology, Surat, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Choudrie et al. (eds.), ICT for Intelligent Systems, Smart Innovation, Systems and Technologies 361, https://doi.org/10.1007/978-981-99-3982-4_2
13
14
D. P. Bhagat et al.
accumulated with the car itself are very expensive and considering the traffic in India, it’s difficult for a local driver to acquire the same. Nowadays, technology has achieved exciting methodologies. It allows checking the driver’s drowsiness condition via visual medium through a camera on the dashboard or sensing with the help of various sensors. It is efficient and can yield great results if a proper amount of work can be done to improve its accuracy. Newer age sensors can prompt us about actions that can be taken during such conditions but it might be really helpful if it can identify if the driver is drowsy or not and allow him/ her to take appropriate action. The problem with most technologies is that they don’t work with low power, this takes us back to square one, with the increase in cost, not to mention the bulkiness problem. This paper will be focused on developing such a system with the help of deep learning: increasing the accuracy but also making it as compact as possible. The paper is divided into three sections. The next section literature review will include some research and paper studies, and the third section will include the methodology and will take you through the process in detail and, finally, tailing with the acknowledgments and references.
2 Literature Review Accidents due to drowsiness cover a major portion of accidents all over the world throughout the year, which is why a number of researchers and industries have been working on and suggesting various approaches to this problem, such as taking the average of the angle of steering and the speed of the vehicle while the drivers are drowsy and using it as a reference. But the issue with this method is that the parameters that were chosen for measurement are very subjective so we can’t fully put a foot on it. Another approach was related to deep learning. A machine learning-based algorithm measures different parameters and expressions and uses them to train well in terms of the diameter of the eyelids. Again, it had high accuracy but the light should be enough to capture the images in order to predict the person’s fatigue. There was another approach that monitored brain waves and the driver’s pulse to check if she or he is drowsy or not, but there were no consistent results. Some are thronging all the available methods together and trying to house it in one system but it all comes at a cost. So, this section will include the brief for previous approaches made in this area. Multilayer perception classifier (MLP) noncomplex network was used housing intertwined neurons that learn from training. The dataset used is given by National Tsing Hua University (THU) under different driving scenarios [1]. They made a detector using the information collected after a number of subjects participated in a test. The subjects were made to sleep for different hours and were made to drive around 5,000 km. Then, the detector was used to alert the driver using Karolinska Sleepiness Scale (KSS). It starts from 1 to 9 (very sleepy) [2]. IR sensitive cameras were used to detect the driver’s face under various conditions. The Gaussian Model is then used to measure the eyes’ closure [3]. The computer gets the image in the RGB format. The LAB method was used
Drowsiness Detection System
15
so that, first, the RGB image is converted in LAB format, where L is luminance, A (green to red) and B (blue to yellow) both parameters ranging from –120 to 120. The fuzzy c mean algorithm was used to check the iris and cupola region of both the eyes [4]. Drowsiness was predicted for night shift workers during morning driving. The inclusion of an individual driver factor improves the prediction of drowsy driving. Driving performances do not improve the prediction of drowsy driving was the drawback [5]. Participants were made to drive for a predetermined amount of time in a simulator, where the road was made by Scanner studio. It was conducted based on a study that claimed that the probability of people sleeping after lunchtime during 14:00–16:00 h are comparatively more than during night time. Moreover, the room was air-conditioned to provide more comfort to see if it induces sleep adding on with the time probability and after lunch [6]. This paper proposes a drowsiness detection system based on behavioral measures using machine learning techniques. It includes eye blinks, head movements, and yawning. For accurate results, the system requires robust and accurate algorithms [7].
3 Project Implementation The figure shows the visual representation of the implemented technique. It is simple and easy to understand. The implemented model houses the functioning as follows (Fig. 1). Step 1: System initialization and capturing the real-time image of the driver. This purpose is achieved by using a camera fixed on the dashboard. Step 2: Extract the features and track them. The different points on the face are extracted, marked, and are being followed throughout the driving. 68-point facial landmark pre-trained predictor on IBUG 300-W dataset is used. Also, a rectangular bounding box is used to follow the face. Step 3: Threat assessment. The points that are extracted from the face are used to see if the driver is drowsy or not. The Mouth Aspect Ratio (MAR) and eye-aspect ratio (EAR) are used for threat measurement. Step 4: Report the threat if the driver is found drowsy using this model. The rectangular frame, which is being used for tracking the face, will then turn from green to red with ‘Drowsy’ written over it as part of the prediction. Moreover, it sets off an alarm to get the attention of the driver. Step 5: If the driver is found awake then the alarm would turn off and the bounding box will turn from red to green, continuing to capture the driver in an infinite loop. In our system, we have utilized dlib which is used to detect facial landmarks in any image. These landmarks are used to localize features of the face such as Eyes, Nose, Eyebrows, Mouth and Jawline. Using the dlib library, the facial structure is detected which is important in identifying the state of the driver. With the inclusion of a facial landmark detector [8] in the dlib library which creates a bounding box of
16
D. P. Bhagat et al.
Fig. 1 System block diagram
(x–y) coordinates around the face of a person from an input image. In this detection step, a group of regression trees are trained to estimate the facial landmark positions without directly extracting any feature with the help of prediction. This makes it more suitable for real-time operation (Fig. 2). The dlib module works as follows (Fig. 3). The pre-trained detector then estimates the location of the facial features from 68 known location, which is elaborated in Table 1 and Fig. 4. The input image through openCV is then converted into grayscale and resized to 500 pixel-width for fast operation. The dlib library then returns an object containing the 68 (x,y)-coordinates of the facial region. Later, the facial landmark detection module is applied to each part of the face and it is then converted into NumPy array for easy handling with the python language. The obtained data then moves on to the next part to calculate the details of the obtained facial data for detection of drowsy state. In this part, we elucidate the implementation of a technique for measuring the blink duration of the eyes of the driver using a dashboard camera to predict the state of the driver and whether to rule him/her as drowsy or not. This could be termed as Behavioral Measurement. By this, we can impart the computer to take the measurements of the eye parameters. These parameters generally would be the ratio between the upper part and the lower part of both the eye and the duration for which the eyes remain close or open. We have used a standard 68 points facial landmark predictor which shows the eyes, eyebrows, nose, mouth, and jawline. The
Drowsiness Detection System
17
Fig. 2 System process flow
INPUT
CNN
Features Extraction
Classifier
Drowsiness Detection Fig. 3 Face recognition flow
Detect face by dlib Perform face alignment by dlib
Extract face feature by mxnet
Find most similar face from database
18
D. P. Bhagat et al.
Table 1 Facial detection points and parts Points
Facial parts
0–16
Jaw
17–21
Right eyebrow
22–26
Left eyebrow
27–35
Nose
36–41
Right eye
42–47
Left eye
48–60
Outer lip
61–67
Inner lip
Fig. 4 Trained facial landmark [9]
best part about using this is that one could use this with any dataset they want to and it would still work perfectly. This 68-point facial landmark detector was trained on the IBUG 300-W dataset. There’s also a 192-point facial landmark detector trained on the HELEN dataset. As shown in the Fig. 4 for facial detection.
Drowsiness Detection System
19
Fig. 5 Eye Aspect Ratio (EAR)
3.1 Eye Aspect Ratio (EAR) The ratio of the length of the eyes to the width of the eyes [9]. The length of the eyes is calculated by averaging over two distinct vertical lines across the eyes as illustrated in Fig. 5. E AR =
∥ p2 − p6 ∥ + ∥ p3 − p5 ∥ 2∥ p1 − p4 ∥
3.2 Mouth Aspect Ratio (MAR) Similar to the EAR, the MAR, as you would expect, measures the ratio of the length of the mouth to the width of the mouth [9]. Due to fatigue, people yawn and lose control over their mouth, making their MAR to be higher than usual in this state (Fig. 6). M AR =
Fig. 6 Mouth Aspect Ratio (MAR)
|E F| |AB|
20
D. P. Bhagat et al.
3.3 Mouth Aspect Ratio Over Eye Aspect Ratio (MOE) MOE =
M AR E AR
MOE is simply the ratio of the MAR to the EAR [9]. MOE provides us with results that are easier to interpret and more accurate than the individual values of MAR and EAR. As opposed to MAR and EAR, MOE can easily be read as a measure of drowsiness in the driver when it shows an increase, i.e., when the value of the numerator increases and the value of the denominator decreases.
4 Parameters As in Fig. 4, 68 points mapped according to the facial structure. We converted the realtime image from BGR format to grayscale, the particular reason for the circumstance is that there is only one channel in the grayscale format so it removes the load from the CPU and it consumes less computational power. To constantly track the location of the face in the frame, we used a rectangular frame around the face. Capturing the coordinates of the face, we will get four coordinates a number of times, like, p1, p2, p3, p4. They are the left, top, right bottom rectangular coordinates. We’ll get the rectangle frame around the face using this as shown in the figures.
5 Results This pattern can be used not only in automobiles but also in many other areas such as nowadays on social media apps you see different filters where it can change the eyes size, jaw size, nose shape, you open your mouth and some transition happens (Figs. 7, 8 and 9). All these are possible with this method. But it’s understandable if, for the sake of simplicity, we only use a rectangular frame for following the face and the points can be used for calculations. The drowsiness detection program works on some inbuilt libraries which helps us to get efficient results. The libraries which are used are dlib, cv2, math, and playsound. All of these libraries play an important role in the detection of a drowsy person. Dlib is used to point the coordinates on the driver’s face. Cv2 is used for video detection and play sound is used for the security alarm. All of these libraries work together to make the system complete. The working of this system can be divided into two parts: 1. Detecting or localizing the face. 2. Predicting the landmarks of important regions in the detected face.
Drowsiness Detection System
21
Fig. 7 When we yawn our eyes get closed naturally which draws our attention from the road, so if the driver will yawn for more than the predefined time which could be dangerous, then the model will show drowsy in this case
Fig. 8 When the eyes are closed for more than the predefined time then the rectangular face turns red implying that the driver is drowsy. Also, ‘Drowsy’ will be written on the rectangle
Once the landmarks are predicted we take the Mouth Opening Ratio and Eye Blinking Ratio. If the ratio of the eye blinking is greater than 4.3 or if the Mouth Opening Ratio is greater than 0.30 or if the Mouth Aspect Ratio over Eye Aspect Ratio is greater than 0.069, the count increases if the count is greater than 8 s then the person is declared as drowsy. The system displays the same along with prediction in terms of accuracy (%). The pre-trained landmark detector is used to estimate the ratios in real-time. So, if the person is in a drowsy state, then the play sounds library will buzz a sound that
22
D. P. Bhagat et al.
Fig. 9 When the eyes are open the rectangular frame following the face will be green indicating that the driver is not drowsy
can wake up the driver. The whole detection will take place continuously at 30 fps in real-time and give accurate results. The considered assumptions were taken for the sake of simplicity and based on ideal conditions. However, further improvements would be done in further analysis. 1. The frame of the camera capturing the live image of the driver on the dashboard is kept constant, i.e., the camera position is fixed. It is assumed that the driver would be in the frame throughout the driving. 2. If the driver has put on shades or sunglasses, basically anything that is preventing the camera from capturing the eyes of the driver which is the major part of this model, then it would be a hindrance. So, for the time being, we are not taking into account this parameter of the driver putting on anything which could prevent the capturing. 3. The camera on the dashboard does not come with a flash. It has been assumed that there is enough environmental light being cast on the driver’s face, making it possible for capturing the image and giving the best results possible.
6 Conclusion The work presented in paper is based on facial landmarks. Its purpose is to measure the Blinking Ratio and Mouth Opening Ratio and if they’re closed for longer than a predefined time, then necessary actions should be taken. It is specifically made for automobile-based embedded systems. Around 97–98% accuracy has been achieved through proposed systems. There is still room for improvement, for instance, more work can be done on very low lighting situations, and if the driver is wearing spectacles.
Drowsiness Detection System
23
References 1. Jabbar R, Al-Khalifa K, Kharbeche M, Alhajyaseen W, Jafari M, Jiang S (2018) Real-time driver drowsiness detection for android applications using deep Neural Networks. Proc Comput Sci 130:400–407 2. Bronte S, Bergasa LM, Delgado B, Sevillano M, Garcia I, Hernandez N (2010) Vision-based drowsiness detector for a realistic driving simulator. Department of Electronics University of Alcala 3. Garcia I, Bronte S, Bergasa LM, Almazán J, Yebes J (2012) Vision-based drowsiness detector for a real driving condition 4. Chisty JG (2015) A review: driver drowsiness detection system. Int J Comput Sci Trends Technol IJCST 3(4). Department of Computer Science and Engineering RIMT-IET, Punjab Technological University, Jalandhar, India 5. Liang Y (2015) Accident analysis & prevention. Liberty Mutual Research Institute for Safety, Hopkinton 6. de Naurois CJ, Bourdin C, Stratulat A, Diaz E, Vercher JL (2019) Detection and prediction of driver drowsiness using artificial neural network models. Aix Marseille Univ, CNRS, ISM, Marseille, France, Groupe PSA, Centre Technique de Velizy, Velizy-Villacoublay 7. Ngxande M, Tapamo JR, Burke M (2017) Driver drowsiness detection using behavioral measures and machine learning techniques: a review of state-of-art techniques. In: 2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech), Bloemfontein, pp 156–161. https://doi.org/10.1109/RoboMech.2017.8261140 8. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE conference on computer vision and pattern recognition, pp 1867–1874 9. Souvik G. Drowsiness Detection System in Real-Time using OpenCV and Flask in Python. https://towardsdatascience.com/drowsiness-detection-system-in-real-time-using-ope ncv-and-flask-in-python-b57f4f1fcb9e 10. Yu J, Park S, Lee S, Jeon M (2018) Driver drowsiness detection using condition-adaptive representation learning framework. IEEE Trans Intell Transp Syst 20(11):4206–4218
A Deep Learning Technique to Recommend Music Based on Facial and Speech Emotions R. Pallavi Reddy, B. Abhinaya, and Athkuri Sahithi
Abstract Human emotion is a sophisticated cognitive state. It is a psychological aspect that improves communication in interpersonal relationships. Human emotion detection is possible through facial gestures and spoken words. There are an endless number of feelings that someone could experience. It is tough to identify a feeling among the innumerable emotions. Anger, sadness, happiness, fear, disgust, and surprise are only a few of the basic emotions that have been identified by psychologists as having an impact on human decision-making in order to solve this issue. All other emotions are a combination of the basic emotions when compared to them. The ability to recognize fundamental emotions facilitates the assessment of a person’s mental health. The detection of fundamental emotions is currently used in a variety of applications, including video games, security, and the monitoring of human health. Individuals frequently utilize music as a tool for mood control, specifically to lift their spirits, boost their energy, or soothe tension. Also, listening to the correct music at the right moment may enhance mental health. This paper focuses on designing an automatic emotion recognition system through facial expression and speech and suggests music recommendations suitable to the emotion detected. Detection of emotion is done through an audio and live feed from webcam, using Deep learning technique i.e., Convolutional Neural Network. The model is trained to classify seven different emotions neutral, disgust, happy, fear, sad, angry, and surprise with accuracy 63% and 65% of face and speech, respectively.
1 Introduction According to a survey conducted in 2019 and 2020, 68% of adults between the ages of 18 and 34 listen to music every day, with the average person spending 16 h and 14 mins each week doing so. This makes it quite evident that music promotes relaxation and serves as a brief form of escape. Technology has led to the creation of numerous music R. Pallavi Reddy (B) · B. Abhinaya · A. Sahithi G. Narayanamma Institute of Technology and Science For Women, Shaikpet, Hyderabad, Telangana 500104, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Choudrie et al. (eds.), ICT for Intelligent Systems, Smart Innovation, Systems and Technologies 361, https://doi.org/10.1007/978-981-99-3982-4_3
25
26
R. Pallavi Reddy et al.
players with functions like fast forward, pause, shuffle, repeat, etc. Nevertheless, no programs exist that suggest songs based on a user’s facial expressions or voice patterns. Emotions are really important in daily life. People may communicate and understand each other better by expressing their emotions. Machines may eventually be able to comprehend human feelings thanks to emotion recognition, which has several critical applications. Artificial intelligence and psychology human emotion recognition are two distinct but equally significant study areas in automatic emotion recognition. Human emotional states can be inferred through verbal and nonverbal cues including voice tonality and facial expressions. One of the most fundamental ways that people can communicate is through speech. In the modern digital era of distant communication, emotion detection and analysis are essential since emotions are a key component of communication. Because emotions are subjective, detecting them might be difficult. Speech and facial expressions typically occur in human emotional interactions, share many of the same thematic and temporal characteristics, and have both received a great deal of attention in the field of emotion computing. Due to the development and popularity of internet streaming services, which currently put practically all of the world’s music at the user’s fingertips, Music Recommender Systems (MRSs) have seen a boom in recent years. Even though modern MRSs greatly assist consumers in locating interesting music in these vast collections, MRS research still faces several obstacles. Contrary to movies, music is typically listened to sequentially that is, in a playlist or listening session rather than one song at a time. This presents a number of difficulties for an MRS in determining how to arrange the things in a recommendation list. Tens of millions of music pieces are now accessible to music streaming services like Spotify, Pandora, or Apple Music. MRSs are frequently quite successful at suggesting songs that meet their customers’ preferences by filtering this multitude of music items and preventing option overload. However, these systems are still far from ideal and frequently generate recommendations that are not satisfactory. This is partially due to the fact that users’ musical preferences and needs depend heavily on a broad range of variables that are not adequately taken into account in existing MRS approaches, which are frequently built around the fundamental idea of user-item interactions or, occasionally, content-based item descriptors. The classic music system programmes can help us find a certain song or recommend songs that go well with our playlists, but making and managing big playlists takes a lot of work. The proposed work focuses on creating an application that can accurately detect emotions by observing how people speak and look and then choose music that matches the feeling by applying deep learning techniques. The content of the paper is organized as follows: Sect. 2 focuses on the review of the literature. The datasets are discussed in Sect. 3. A detailed description of the proposed system—architecture is given in Sect. 4. The implementation of the proposed system is highlighted in Sect. 4. Section 5 outlines results and discussions. The conclusion and future scope is given in Sect. 6.
A Deep Learning Technique to Recommend Music Based on Facial …
27
2 Literature Survey There have been a lot of publications published recently that use deep learning to process facial expressions. A multimodal emotion detection model based on speech and facial expression is proposed. It employs CNN and LSTM to learn global and context-specific high-level speech emotion features and numerous small-scale kernel convolution blocks to extract face expression features [1]. The first work that was discovered and cited was one on image net categorization [2]. In the experiment, the researchers first classify images using an image net classifier. Multiple deep neural networks [3] work together in the same environment to classify static facial images, which allows us to fully comprehend the methodology. The paper’s findings are rather unexpected, yet complicated networks take longer to process. A. Yao was able to recognize facial features and evaluate the quality of the feature set [4]. Using Levi’s studies [4], possible patterns on face expression can be established. Over a single modal, the recognition performance clearly has advantages by integrating more emotional elements and fully utilizing complementary emotional information, which improves the accuracy of the recognition result [5, 6]. Future humancomputer interaction research will inevitably move toward multi-modal recognition systems that integrate speech, image, and other emotional data [7]. The strengths and weaknesses of facial expression and speech-based emotional state identification systems are examined in this research. According to the findings where, the emotions being studied, the method that considers facial expression performs better than the one that only takes verbal input [8]. Voice Emotion Recognition that uses CNN and Decision Tree was proposed by Damodar et al. [9]. A method was put forth where features were taken out of preprocessed audio files using MFCC. The retrieved characteristics are classified using CNN and Decision Tree, two classifiers. Using these algorithms, the accuracy of predicting the emotions is 72% and 63%, respectively. The methodology used by Zhao et al. [10], recognizes speech emotion using merged deep CNN which focuses on learning of deep features. The 1D and 2D CNN architectures are created, assessed, and then fused to generate the merged deep CNN. The combined convolutional neural network (CNN) has two branches, one in one dimension (1D) and the other in two dimensions (2D). The retrieved characteristics are classified using CNN and Decision Tree. The suggested study uses a convolution neural network to identify facial emotions. Human emotion can be precisely classified into seven straightforward emotions (excited, enraged, sorrowful, scared, shocked, disgusted, and neutral). The complicated facial muscles are used to express emotions on the face. These very subtle and nuanced indications of speech also reveal a great deal about the state of our minds. The steps taken to achieve accuracy include choosing the data, processing it, training it, and selecting a model that would be suitable for the problem and provide good accuracy. As part of the process, the model is given attributes from the given image to enhance it, activation and generalization are used to ensure that the model performs effectively, and training and testing are used to complete the process.
28
R. Pallavi Reddy et al.
The design of the model starts with data preprocessing, followed by convolution layers that map the comparable characteristics of the supplied image over numerous layers, max pooling reduces the image size based on the assumption that the features would be the same, and finally a fully connected layer that would be helpful in emotion recognition. The robotics industry, which will give them emotions and eventually the blind community, will benefit greatly from this technology. The concept could be successfully applied in a variety of real-world implementations in industries like health, video games, and marketing [11].
3 Proposed System 3.1 Datasets a. FER-2013 The results of a Google image search for each specific emotion were collected and combined to form the FER-2013 dataset. Each 48 × 48 pixel grayscale photograph of a face makes up about 35,685 in total. The training set has 28,709 instances and 3,589 examples make up the public test set. The faces were automatically registered such that each one is about centered and takes up a similar amount of space in the image. Every single image in the FER dataset is categorized and includes one of the seven emotion illustrations: surprise, happiness, happiness, disgust, and terror—as well as a neutral image. b. RAVDESS The Ryerson Audio-Visual Database of Emotional Speech and Song dataset is also known as RAVDESS. The RAVDESS has 1,440 files: 60 trials for each actor divided by 24 actors. 24 professional actors—12 males and 12 females—perform two lexically similar phrases in the RAVDESS with a non-descriptive North American accent. Speech expressions might be neutral, joyful, sad, angry, afraid, surprised, or disgusted. There are two emotional intensity levels (normal and strong) and one neutral expression produced for each expression.
3.2 Methodology The idea of Face and Speech Emotion Based Music Recommendation System is to create a multimodal emotion recognition system that benefits from the complementary information provided by audio and visual elements. While CNN and LSTM are used to train global and context-specific high-level speech emotion features, several small-scale kernel convolution blocks are created to extract the properties of facial expressions. Using an ensemble of CNN-LSTM binary classifiers trained
A Deep Learning Technique to Recommend Music Based on Facial …
29
to recognize each of the emotions. The model classifies speech emotions into seven categories: sad, angry, glad, disgust, surprise, neutral, and fear. The classification of facial emotions also uses image segmentation and feature extraction. Based on the model’s results, the user suggests relevant musical selections. There are three modules: music recommendation, speech emotion recognition, and facial expression recognition. Signal processing and emotional training recognition make up the bulk of the entire procedure. To create a useful voice signal, the audio signals must first undergo preprocessing. Pre-emphasis, endpoint identification, framing, window addition, and other operations are included in the preprocessing. Later, the facial geometry features from processed images and the prosodic characteristics from processed sound signals are altered and extracted into feature vectors, respectively. Now that it has been trained for recognition, the Gaussian mixture model is prepared to be added to the recognition step, which also entails extracting prosodic feature parameters and reducing the sizes of the speech test samples and the video test samples. Figure 1 illustrates the merging of the outcomes from two classifiers for the ultimate decision. a. Speech Emotion Recognition (SER) As illustrated in Fig. 2, the three main parts of the SER system are the classifier, feature extraction, and feature selection. The main classifiers used in SER applications include Support Vector Machine (SVM), Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), Deep Neural Networks (DNNs), and others. There hasn’t been any consensus on the ideal classifier for the SER system, though. Therefore, when compared to the classifier, the extraction and selection of discriminative, robust, and affect-salient features are significantly more crucial. Acoustic filters are applied to the original audio signals during signal processing to separate them into meaningful components. The Mel-scale, a logarithmic change of a
Fig. 1 Architecture of the music recommendation system based on facial and speech emotions
30
R. Pallavi Reddy et al.
Fig. 2 Speech emotion recognition (SER) System
signal’s frequency, is used in the signal preprocessing together with the extraction of the Log-Mel-spectrogram. The Mel-Spectrogram is produced by applying a Fourier transform to analyze a signal’s frequency content and scale it to the Mel-scale. It is a useful tool for identifying hidden elements in audio and visualizing them. It is a Time Distributed Convolutional Neural Network that uses Convolutional Neural Networks and LSTM to categorize opposing emotion. It comprises of a fully connected layer, an LSTM layer, and a deep neural network with four convolutional layers. The fundamental goal of a Time Distributed Convolutional Neural Network is to roll a window along the Log-Mel spectrogram with a given size and time step. Since LSTM works best with sequential speech data, they have been employed to improve the outcomes. The aspects that convey the most emotional information is explained by these experiments’ accuracy in emotion recognition. This aids in identifying a person’s emotions through speech, and the CNN-LSTM model is utilized with binary classifiers on the test dataset containing seven or more distinct emotions. b. Facial Emotion Recognition (FER) Figure 3 shows the dataset image that is taken into consideration. Next, data preparation is carried out, which involves converting the RGB image from the dataset into grayscale, identifying faces, and image scaling, which is a crucial step in data preprocessing. Later, the flow continues with feature extraction and image segmentation. The classifier uses the Train/Test approach to determine the overall outcome of the recognition after the image’s features have been obtained, with some data going to the training phase and other data going to the testing phase. The dataset in FER will be divided into training and validation collections to ascertain if the model is overfit to the training dataset or not using the validation dataset. This makes it possible to test and train the same database at the same time, ensuring that the model is as accurate as possible and that any errors can be quickly and easily fixed. Convolution is used to multiply matrices with a filter before moving on to the feature detector. Each channel rolls over and summarizes the features in a sliding window that is used in the feature detector. The results are standardized using batch normalization since neural networks are particularly sensitive to unnormalized data. A cascade classifier is used to detect faces. A multi-stage classifier called a cascade classifier can swiftly and precisely accomplish detection. Each level of the input
A Deep Learning Technique to Recommend Music Based on Facial …
31
Fig. 3 Facial emotion recognition (FER) system
image evaluation is completed. The input image is rejected if the outcome for a certain stage is negative. The process moves on to the next stage if the result is positive. The Xception model, a deep convolutional neural network architecture that uses depth-wise separable convolutions, is used to classify the emotion of the image with the identified front face into one of the seven categories taken into account. For each input channel, a single convolutional filter is applied. The incoming flow, middle flow, which is then eight times repeated, and exit flow are the first three flows that the data in the Xception CNN model goes through. Batch normalization is applied after each convolutional and separable convolutional layer. The Xception architecture relies on depth-wise separable convolutions, which allows for faster training by requiring the training of fewer parameters. c. Music Recommendation This module communicates with modules that recognize emotions in speech and on the face. Depending on the user’s preference, it accepts input as the distribution of emotions from one of the two modules—speech emotion recognition or face emotion recognition. Utilizing probability from the distribution of emotions, the maximum emotion is computed. Songs from the “songs” directory that contain different emotions are delivered as output along with emotion analysis with percentage of probability based on the maximum emotion attained.
32
R. Pallavi Reddy et al.
4 Implementation 4.1 Facial Emotion Recognition 1. Data Preprocessing The Keras library is imported to preprocess the test and training images before sending them to the CNN model. The csv pixels from the file (dataset) are then subjected to preprocessing procedures including shrinking, resizing, converting to grayscale, and standardizing in the data frame. The face detection and normalization procedures are applied to each image sample. Any pose and lighting irregularities that may already be there must be fixed. The images are then vectorized and transformed into Panda data frames and NumPy arrays. 2. Face Detection A cascade classifier is employed to find faces. A cascade function is developed using machine learning by using a large number of both positive and negative images. The next step is to utilize it to find items in other pictures. A multi-stage classifier called a cascade classifier can swiftly and precisely accomplish detection. Each step includes a powerful classifier. An input is assessed sequentially (i.e., step by step). When a classifier for a particular stage produces a poor outcome, the input is instantly eliminated. The input is moved on to the next stage if the output is positive. This multi-stage method enables the creation of more straightforward classifiers that may be used to swiftly reject the majority of negative (non-face) input while devoting more time to positive (face) data. Figure 4, shows how a cascade classifier operates. Each level of the input image evaluation is completed. The input image is disregarded, if the outcome for a particular step is negative. The process moves on to the next stage if the result is positive. 3. Emotion Recognition
Fig. 4 Cascade classifier
A Deep Learning Technique to Recommend Music Based on Facial …
33
The Xception model, which employs depth-wise separable convolutions, is used to categorize the emotion of the image with the identified front face into one of the seven categories taken into account. The Xception CNN design differs slightly from the standard CNN model in that, fully connected layers are applied last. In typical CNN architectures, the majority of the parameters is located in this layer and is computed using conventional convolutions. The Xception CNN design makes use of depth-wise separable convolutions and residual modules. Residual modules alter the anticipated layer-to-layer mapping. This illustrates the distinction between the original feature map, desired features, and learnt features. The Xception CNN model’s entrance flow is where data enters, followed by the middle flow, which is iterated eight times, and the exit flow, which is where it leaves. Each Convolution and Separable Convolution layer is followed by batch normalization. The working of Facial Emotion Recognition using the Xception model is shown in Fig. 5. The webcam is started to provide the video input. A face is recognized from the video input, and it is zoomed. The Xception model is then used to identify the emotion.
Fig. 5 Facial Emotion Recognition using Xception model
34
R. Pallavi Reddy et al.
4.2 Speech Emotion Recognition 1. Data Preprocessing a. Acoustic filters are applied to the original audio signals during signal processing to separate them into meaningful components. The preprocessing of the signal entails: b. Signal discretization: a method of changing a continuum into a finite collection of points. It occurs during the conversion of continuous-time signals, such as audio, into discrete signals. c. Audio data augmentation: a method frequently used to broaden the dataset’s diversity. This is accomplished by subtly altering the existing data samples, i.e., by adding noise, shifting time, and altering pitch and speed. Pitch and speed manipulation are assisted by the librosa (library for Recognition and Organization of Speech and Audio) library while noise injection and time shifting are taken care of by the NumPy library of Python. d. Log-Mel-spectrogram extraction: The frequency of a signal is transformed logarithmically to create the Mel Scale. By using a Fourier transform to examine a signal’s frequency content and transform it to the Mel-scale, the Mel-Spectrogram is created. It is a useful tool for identifying hidden elements in audio and visualizing them. 2. Emotion Recognition In terms of image identification and other computer vision tasks, CNN has demonstrated superior performance. Sequential data analysis has shown to benefit greatly from the usage of long short-term memory. Therefore, applying both in succession would allow the model to learn feature dependencies on both short- and long-term time scales. Thus, CNN and LSTM can benefit from the advantages of both networks. CNN-LSTM, a temporally distributed convolution neural network, is used to recognize speech emotions. It is made up of four convolutional layers in a deep neural network, an LSTM layer, and a fully connected layer. A batch normalization layer, an activation layer, and a max pooling layer come after each convolution layer. The Mel spectrogram input feature is given a rolling window by the time distributed convolutional layers. As a consequence, a series of images are produced, and the first layer of a neural network is given this sequence of images as input. The Learning Module consists of a time distributed convolutional layer, batch normalization, activation function, max pooling layer, and dropout layer (LM). The activation function used by these four LMs is an Exponential Linear Unit (ELU), which is succeeded by a Max Pooling layer. In this manner, the feature maps for the image are made smaller, which lowers the number of trainable parameters even more. The neural network layer’s neurons are randomly removed by the dropout regularization to assist prevent overfitting. Because speech signals are time-varying and spectrograms predominately contain a time component, it is worthwhile to try to examine these temporal aspects of voice sounds. The use of lengthy short-term memory has proven to be quite beneficial for sequential data analysis. So, it may be useful to
A Deep Learning Technique to Recommend Music Based on Facial …
35
Fig. 6 Speech Emotion Recognition using CNN-LSTM model
identify and extract the global temporal components from the Mel spectrogram. The model is expanded with an LSTM layer acting as a learning layer, followed by a dense fully connected layer utilizing SoftMax as its activation function. Figure 6, depicts the CNN-LSTM model for Speech Emotion Recognition in full operation. Spectrograms are treated in a manner akin to pictures. When CNN receives this spectrogram, it uses it to create predictions and then sends the results to an RNN composed of LSTMs. An identified emotion is then presented as the output.
4.3 Music Recommendation The music suggestion module is implemented using technologies like Python, Flask, HTML, CSS, and JavaScript. This module communicates with modules that recognize emotions in speech and on the face. Based on the user’s selection, it receives input as a distribution of emotions from one of these two modules. Utilizing probability from the distribution of emotions, the maximum emotion is computed. Songs from the “songs” directory that contain various emotions csv files are delivered as output along with emotion analysis with percentage of probability based on the maximum emotion obtained.
36
R. Pallavi Reddy et al.
Fig. 7 Accuracy on training and validation set with accuracy 63%
5 Results and Discussions 5.1 Datasets and Performance Measures 1. FER-2013 Dataset The CNN architecture makes interpretation simple (Fig. 6). The accuracy curve given in Fig. 7 shows the training and validation accuracy for different epochs. It is possible to plot class activation maps, which show the pixels that the final convolution layer activated. It is observed that the pixels respond differently based on the emotion being labelled. As shown in Fig. 8, the happiness appears to be dependent on the pixels associated with the eyes and lips, while the sadness or anger, for instance, appears to be more closely related to the eyebrows (Fig. 8). For RAVDESS dataset, the accuracy curve given in Fig. 9 shows the training and validation accuracy for different epochs. 2. RAVDESS Dataset
5.2 Web Application Results User can select ‘Record video’ button or ‘Record Audio’ button as shown in Fig. 10. By clicking the “Start Recording” button after selecting the “Record video” option, the user can begin recording video. The face’s expression and the dispersion of
A Deep Learning Technique to Recommend Music Based on Facial …
Fig. 8 Class activation map
Fig. 9 Accuracy on training and validation set with accuracy 65%
37
38
R. Pallavi Reddy et al.
Fig. 10 Home page
emotion are preserved on video for about 15 s. Figure 11 displays the distribution of face expressions together with music suggestions depending on the most common expression.
Fig. 11 Results of Facial Emotion Recognition and Music Recommendation
A Deep Learning Technique to Recommend Music Based on Facial …
39
Fig. 12 Results of Speech Emotion Recognition and Music Recommendation
If the user chooses the “Record audio” option, they can begin audio recording in 15 s by selecting the “Start Recording” button. The “get recommended music” button is selected and displays the results after 15 s. Figure 12 displays the distribution of spoken emotions together with music suggestions based on the most common emotion.
6 Conclusion and Future Work The CNN-LSTM voice emotion recognition module was developed using the RAVDESS dataset. Utilizing CNN, the Facial Emotion Recognition module is implemented with the FER dataset (Xception Model). Both models are accurate in detecting emotions on the face and speech at 63% and 65%, respectively, including anger, disgust, fear, glad, sad, surprise, and neutral. Emotion analysis makes song recommendations based on the most intense emotions that may be heard in the video or audio. Future work will integrate an automatic music player that can play any song from the song’s dataset based on the emotion that is recognized the most frequently, and will employ the fusion of emotions technique to capture both face and spoken emotions from videos. Although facial and spoken expressions of emotion are among the most significant ways to convey information about the emotional state, they are always constrained by knowledge of just the six fundamental emotions plus neutral. It clashes with the more nuanced emotions found in daily life, which are also present. This will encourage academics to focus on creating larger databases more potent deep learning architectures in order to distinguish all primary and secondary emotions in the future. Additionally, multimodal complex systems analysis has replaced unimodal analysis in today’s emotion recognition.
40
R. Pallavi Reddy et al.
References 1. Cai L, Dong J, Wei M (2020) Multi-modal emotion recognition from speech and facial expression based on deep learning. In: IEEE conference on 2020 Chinese Automation Congress 2. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. NIPS, vol 1, p 4 3. Yu Z, Zhang C (2015) Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI’15, New York, NY, pp 435–442. ACM 4. Yao A, Shao J, Ma N, Chen Y (2015) Capturing auaware facial features and their latent relations for emotion recognition in the wild. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, ICMI’15, New York, NY, pp 451– 458, ACM 5. Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Proceedings on ACM International Conference on Multimodal Interaction (ICMI) 6. Cai J, Meng Z, Khan AS, Li Z, O’Reilly J, Han S, Liu P, Chen M, Tong Y (2019) Featurelevel and model-level audiovisual fusion for emotion recognition in the wild. In: 2019 IEEE conference on multimedia information processing and retrieval (MIPR), San Jose, CA, pp 443–448 7. Zhao J, Mao X, Chen L (2018) Learning deep features to recognize speech emotion using merged deep CNN. IET Signal Process 12(6):713–721 8. Wang Y, Yang X, Zou J (2013) Research of emotion recognition based on speech and facial expression. TELKOMNIKA Ind J Electr Eng 11(1):83–90 9. Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th international conference on data mining (ICDM), Barcelona, pp 439–448 10. Damodar N, Vani HY, Anusuya MA (2019) Voice emotion recognition using CNN and decision tree. Int J Innov Technol Exp Eng (IJITEE) 8:4245–4249 11. Modi S, Bohara MH (2021) Facial emotion recognition using convolution neural network. In: Proceedings of the fifth international conference on intelligent computing and control systems (ICICCS 2021) 12. Emerich S, Lupu E, Apatean A (2021) Bimodal approach in emotion recognition using speech and facial expressions. Communication Department, Technical University of Cluj-Napoca, Cluj-Napoca, Romania 13. Joy J, Kannan A, Ram S, Rama S (2020) Speech emotion recognition using neural network and MLP classifier. In: IJESC 14. Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and recurrent neural networks. In: Audio and Acoustics Research Section, ETRI, Daejeon, Korea 15. Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou S (2017) End-to-end multimodal emotion recognition using deep neural networks. J Latex Class Files 14(8):1301–1309 16. Laxmi Pranavi T, Suchita T, Srirama Manikanta B, Dhanaraj M, Sai Siva Srinivas S (2022) Speech emotion recognition using CNN with LSTM 17. Foo SW (2019) Speech emotion recognition using hidden Markov models (HMM) 18. Joshi A et al. (2018) Speech emotion recognition using combined features of HMM & SVM algorithm. Int J Adv Res Comput Sci Softw Eng 3(8):387–393
Smart Chair Posture Detection and Correction Using IOT H. S. Shreyas, G. Satwika, P. Manjunath, M. Shiva, and M. Ananda
Abstract Many people who experience back discomfort discover that sitting incorrectly is the root of their problem. Computer operators can work from a supine or significantly reclined position thanks to certain devices. The goal of this research is to identify a person’s posture and inform them on how to improve it. This can lessen pain in the back, neck, etc. Internet of Things (IOT) and machine learning are used in this project to identify position and posture. Based on a network of interconnected sensors that are physically installed in chairs to gather data, the smart chair system uses the capabilities of the IOT. All of the gathered data is uploaded to the cloud server so that any application can utilize data whenever and wherever it is needed.
1 Introduction This Internet of Things (IOT) is a system that connects a variety of equipment, including digital and analog ones, enabling [1, 2] automated communication. The IOT links the real and virtual worlds. Embedded systems with CPUs, sensors, and communication devices are the norm. From the cloud, data may be transferred and retrieved. By making the environment ever intelligent, IOT is taking over the planet. Our pleasant lifestyles are a result of IOT. IOT is used in various industries, including healthcare, education, industrial applications, and home automation. The project primarily tries to address back problems brought on by prolonged sitting at [3] work or in school. With a few force sensors [4, 5] mounted to the back, the primary goal is to determine a person’s posture.
H. S. Shreyas (B) · G. Satwika · P. Manjunath · M. Shiva · M. Ananda PES University, Bengaluru, India e-mail: [email protected] M. Ananda e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Choudrie et al. (eds.), ICT for Intelligent Systems, Smart Innovation, Systems and Technologies 361, https://doi.org/10.1007/978-981-99-3982-4_4
41
42
H. S. Shreyas et al.
2 Prototype Requirements 2.1 Hardware Requirements Raspberry Pi, Arduino UNO, Force Sensitive Resistor (FSR), MPU6050, 5 V POWER SUPPLY, 10 Ω resistor.
2.2 Software Requirements Thingspeak, MIT APP inventor, Raspbian Pi Imager, Arduino IDE, Python 3, Raspbian OS.
3 Design 3.1 Flowchart This flowchart starts with collecting the values from FSR and Gyroscope sensors using Arduino Uno microcontroller. Gyroscope sensor value is checked before sending the data to Raspberry pi. If the value is more than a certain point then 0,0 is sent to the Raspberry pi since 0,0 is considered as wrong posture. If the value is below a certain point then FSR values are sent from the Arduino to Raspberry pi. Raspberry pi then tests the values by using appropriate ML algorithm. Wrong posture is considered as 0 and right posture is considered as 1. The corresponding result is then sent to Thingspeak and then to MIT application which displays the posture as right or wrong on the smart phone (Fig. 1).
3.2 Pictorial Block Diagram See Fig. 2.
Smart Chair Posture Detection and Correction Using IOT
Fig. 1 Flow chart
Fig. 2 Pictorial block diagram
43
44
H. S. Shreyas et al.
Fig. 3 Interfacing FSR with Arduino UNO
4 Implementation 4.1 Interfacing FSR with Arduino UNO First step in interfacing FSR to Arduino9 [6] is to connect the FSR with a fixed value resistor here it is 10 kohms which creates voltage divider. One end of the FSR is connected to 5 v and the other end to the resistor and then to the ground. The point between the resistor and FSR [7, 8] is connected to analog pin of the Arduino. Output is the analog reading which is the drop across the resistor. Three sensors are similarly connected to A1, A2, A3 (Fig. 3).
4.2 Interfacing MPU6050 with Arduino UNO Gyroscope has four pins. Interfacing is done by connecting the ground pin from the gyroscope to the gnd of the Arduino. VCC of the gyroscope to 5v of the Arduino, the two analog pins in the Arduino is connected to SCL and SDL.SCL (serial clock line), SDL (serial data line) are I2C bus wires that are bidirectional and communication between devices [9] (Fig. 4).
4.3 Connecting Arduino and Raspberry Pi Using an USB is the most easiest way of communicating data from the Arduino to the Raspberry Pi. Data from the Arduino consisting of force values from the FSR is sent serially through USB.
Smart Chair Posture Detection and Correction Using IOT
45
Fig. 4 Interfacing MPU6050 with Arduino UNO
4.4 Machine Learning on Raspberry Pi Sitting posture may vary from position to position, FSR values may vary from posture to posture and person to person. It can be difficult for the system to recognize the posture at that situation. To make the system compatible machine learning is used. Many algorithms like SVM, CNN [9, 10] are used to check the accuracy of the system. It was found that KNN can be the best choice for the taken dataset. Score came out to be 1 which means the model is best suited. Data set is created manual by sitting in various position. Postures are mainly classified into five types: • • • • •
Left oriented. Right oriented. Straight posture. Leaning forward or sitting without any support. Leaning back.
KNN • K-nearest neighbor [11, 12] is a machine learning algorithm which is based on supervised learning. • This is a classification algorithm which classifies the data based on similarity. • The algorithm can be used for regression as well but mainly used for classification. • A non-parametric algorithm which does not make any assumption. • It is also called as lazy learning algorithm. • This algorithm classifies the data based on number of neighbors near to the data. • Number of neighbors can be specified by the user. KNN is imported from sklearn library. Matplot library is used to visualize the data. Pandas and Numpy libraries are imported for reading and performing calculations on the data. Around 300 data samples are been collected by sitting in different sitting postures. The result comes out to be 0 or 1. 0 means the sitting posture is wrong. 1 means the
46
H. S. Shreyas et al.
121x5 (right posture)
207x5 (wrong posture)
Fig. 5 Data sets
sitting posture is right. Green in the picture implies correct posture and red implies the wrong posture. Data sets trained and tested (Fig. 5).
4.5 Raspbian OS The first step in the process is to download the Raspbian OS onto an SD card using the Raspberry Pi Imager software that is available on the official website. After downloading it we will be required to unzip the file and then continue downloading. To do this we will require WinRAR or 7-Zip software on Windows similarly Mac users will require The Unarchiver and Linux users will need Unzip.
4.6 Python Initially python is pre installed on many Linux based systems. If in case it is not installed we will have to go into the terminal window and type the following sudo apt install python3. This command will install python version 3 and you will be able to use it in your Raspberry pi.
Smart Chair Posture Detection and Correction Using IOT
47
Fig. 6 ThingSpeak
4.7 Thonny Python IDE To write python codes we will need an Integrated Development Environment (IDE). Which will help us write code faster and more efficiently and also help identify the bugs and warnings that may arise.
4.8 Thingspeak Cloud Integration with Raspberry Pi Thingspeak is an open source cloud application which is used to connect microcontrollers to the cloud services that may be for visualizing or to provide analysis of the data. These sensor data’s can be monitored from anywhere using the internet. In our project we are using it to analyze the FSR sensor data that is used to measure the posture by applying KNN algorithm in the raspberry pi itself. The raspberry pi analyses the data received from the Arduino and then applies KNN algorithm to the received data after which it will display either a 1 (correct) or a 0 (not correct). This binary value is fed into the Thingspeak environment along with the three other sensor data which will then be displayed on the website. This works with the help of API (Application Programming Interface) keys these help in connecting the Raspberry pi to the Thingspeak environment (how to send data to ThingSpeak Cloud using Raspberry Pi, n.d.) (Fig. 6).
4.9 MIT App Inventor The MIT App Inventor (IoT Made Easy With UNO, ESP-01, ThingSpeak and MIT App Inventor, n.d.) [13] is a web application with which we have designed and made an android application for our posture detection chair. The MIT app inventor also has
48
H. S. Shreyas et al.
Fig. 7 Front side of the chair
an inbuilt firebase DB option as well as an internet connectivity option. The internet connection is what we have used to link thingspeak to the mobile app. It will give input 1 (correct) or a 0 (not correct) to the app and the app will then display a screen showing if the current posture is good or bad.
4.10 Firebase Realtime Database Firebase realtime database [14, 15] helps us build rich, collaborative applications by allowing secure access to the database directly from client-side code. We have used firebase in order to create a personalized experience for a user by giving them login username and password database in the Firebase realtime database.
5 Results 5.1 Final Prototype See Fig. 7.
5.2 Results Case 1: Right posture (Fig. 8).
Smart Chair Posture Detection and Correction Using IOT
49
Fig. 8 Output of right and wrong posture
6 Conclusion and Future Scope 6.1 Conclusion • Prevention is better than cure. Our prototype prevents spinal problems and many other disorders. • A proper chair for health, improving work performance, education, sports. • Our prototype has the following features: Compatibility—Any user can use the chair. Efficiency—Prototype is accurate in predicting the posture Easy to use Easy to carry—sensors can be carried and replaced
6.2 Future Scope This prototype can have growth as follows: • Timer can be fixed to prevent longer sitting on the chair.
50
H. S. Shreyas et al.
• Application along with the prototype can be made available to each and every individual. • Camera or digital inputs can be given to the system, this makes the system even more efficient. Acknowledgements I would like to express my gratitude to Prof Ananda M, Associate Professor, Department of Electronics and Communication Engineering, PES University, for his continuous guidance, assistance, and encouragement throughout the development of this Project. We thank PES University for providing us the required hardware support at right time. Finally, this Project could not have been completed without the continual support and encouragement received from my family and friends.
References 1. Tlili F, Haddad R, Ouakrim Y, Bouallegue R, Mezghani N (2018) A survey on sitting posture monitoring systems. In: Proceedings of the 2018 9th international symposium on signal, image, video and communications (ISIVC), pp 185–190, Rabat, Morocco 2. Sathyanarayana S, Satzoda RK et al (2018) Vision-based patient monitoring: a comprehensive review of algorithms and technologies. J Ambient Intell Humaniz Comput 9(2):225–251 3. S. Sathyanarayana et al (2012) Accuracy and robustness of kinect pose estimation in the context of coaching of elderly population. In: Proceedings of the 2012 annual international conference of the IEEE engineering in medicine and biology society, San Diego, CA, pp 1188–1193 4. Kuo Y-L, Tully EA, Galea MP (2009) Video analysis of sagittal spinal posture in healthy young and older adults. J Manipulative Physiol Ther 32(3):210–215 5. Ailneni RC, Syamala KR, Kartheek I, Kim S, Hwang J (2019) Influence of the wearable posture correction sensor on head and neck posture: sitting and standing workstations. Work 62:27–35 6. Ishaku AA (2019) Flexible force sensors embedded in office chair for monitoring of sitting postures. In: Proceedings of the IEEE international conference on flexible and printable sensors and systems (FLEPS), Glasgow, pp 1–3 7. Biswas J, Tolstikov A, Jayachandran M, et al (2010) Health and wellness monitoring through wearable and ambient sensors: exemplars from home-based care of elderly with mild dementia. Ann Telecommun 65(9–10):505–521 8. Otoda Y (2018) Census: continuous posture sensing chair for office workers. In: Proceedings of the 2018 IEEE international conference on consumer electronics (ICCE). Hindawi Publishing Corporation, Las Vegas, NV, pp 1–2 9. Ma S, Cho W, Quan C, Lee S (2016) A sitting posture recognition system based on 3 axis accelerometer. In: Proceedings of the 2016 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB), Chiang Mai, pp 1–3 10. Cho H, Choi H-J, Lee Ch-E, Sir Ch-W (2019) Sitting posture prediction and correction system using Arduino-based chair and deep learning model. In: Proceeding of the IEEE 12th conference on service-oriented computing and applications (SOCA), Kaohsiung, pp 98–102 11. Zemp R, Taylor WR, Lorenzetti S et al (2016) Seat pan and backrest pressure distribution while sitting in office chairs. Appl Ergon 53(5978489): 1–9 12. Huang M, Gibson I, Yang R (2017) Smart chair for monitoring of sitting behavior. In: DesTech conference proceedings the international conference on design and technology, vol 2017(1), pp 274–280 13. Hillar GC (2017) MQTT essentials a lightweight IoT protocol. Packt Publishing, Birmingham
Smart Chair Posture Detection and Correction Using IOT
51
14. Rushton N (2020) QNAP NAS setup guide. Kindle Edition, Portland, OR 15. Bradshaw S, Brazil E, Chodorow K (2019) MongoDB: definitive guide 3e: powerful and scalable data storage. O’Reily, CA
The Opinions Imparted on Singular’s Face K. Pramilarani, K. Ashok, Srinivas Pujala, Hemanth Karnakanti, and M. K. Vinaya
Abstract The Opinions Imported On A Singular’s Face is a significant computer vision and artificial intelligence topic due to its significant academic and commercial potential. Because facial expressions are one of the primary information channels in interpersonal communication, this review focuses on studies that only use facial images. Albeit the Sentiments Suggested On A Particular’s Face can be led utilizing different sensors, the latest Conclusions Granted On A Solitary’s Face research is summed up in this paper in a reasonable and succinct way. A summary of the primary algorithms for the various representative categories of the Opinions Imparted On A Singular’s Face systems and a description of conventional methods for The Feelings Expressed On A Person’s Face are first presented. After that, a demonstration of how “end-to-end” deep learning approaches make use of deep networks. Suppositions Given On A Particular’s Face, this survey likewise centers around another cross breed profound learning strategy that utilizes a convolutional brain organization (CNN) for individual edges spatial highlights and a long short-term memory (LSTM) for continuous casings’ worldly elements. To quantitatively compare the Opinions Imparted On A Singular’s Face research, benchmark results and a brief description of publicly available evaluation metrics are provided in the paper’s conclusion. Newcomers to The Opinions Imparted On A Singular’s Face can use this brief overview to get a general idea of the most recent cutting-edge research, as well as seasoned researchers looking for productive work directions in the future.
1 Introduction According to a study on the correspondence of human data conducted by the clinician Mehrabian, only 7% of human data are transmitted through language, 38% through supporting vernaculars like sound and voice, and 55% through facial sentiments. The assessment of feelings only began receiving pay in the 1970s, in spite K. Pramilarani · K. Ashok · S. Pujala (B) · H. Karnakanti · M. K. Vinaya Computer Science and Engineering, New Horizon College of Engineering, Bengaluru, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Choudrie et al. (eds.), ICT for Intelligent Systems, Smart Innovation, Systems and Technologies 361, https://doi.org/10.1007/978-981-99-3982-4_5
53
54
K. Pramilarani et al.
of numerous learning methods demonstrating the practicality of altered facial evaluation certification. Six major emotions were described by Ekman and Friesen: I can simultaneously consider disdain, dread, delight, horror, and shock. Programming academics have been actively involved in feeling research since the end of the 1980s. This research led to distributed evaluations of the most effective method for creating mechanized feeling confirmation structures. Standard procedures typically consist of three steps: first, the location of the face is determined. Facial mathematical elements are extracted to create express vectors, and the score with the highest score is used to coordinate the emotions. Most of the time, these methods require a lot of express control. Due to the abundance of information, evaluating a brand is extremely difficult. When it was first shown, CNN began to demonstrate unprecedented potential in the final part of the 1990s. In addition to demonstrating its capabilities in a large number of attempts to request images, it has been utilized in a feeling certificate. Huge Learning CNNs’ ability to clearly extract summaries from data, as opposed to the gloomy and dreary hand-made highlight age utilized in other written learning methods, is one of its primary selling points. In any case, the lack of limit management and information preparation limited CNN usage at the time. CNNs became an unquestionably more feasible tool for picture management, plan perceiving proof, and part extraction as the working group and a group of more prominent datasets worked on them after the 2010s. The face is perceived in this study through haar flood highlight extraction, and the CNN model’s various layers are utilized to complete the feeling depiction. The structure of a person’s face-conveyed conclusion has the potential to significantly influence mechanized thinking. It has the potential to fill in, ensured applications in a variety of fields, such as mental evaluation, seeing driver’s use, shrewd arrangement, versatile applications to typically implant feelings in speech, assistance systems for helpfully intelligent people, a robot with fiery data, and so on. Other potential applications include these and other fields. These and other examples are examples of other applications. The final design is demonstrated by the arranging dataset’s overall quality and proportion. Squeamishness and dread have a precise suspicion rate of less than 60%, according to the issue frameworks provided by the models developed at the beginning of this work. We reduced the part extraction to five hypotheses in light of the current reality conditions and the significance of express opinions in such frameworks. The final model that was suggested can handle emotions like joy, irredeemability, rage, shock, and goal. It is also wanted for these feelings. It has been discovered that, despite the disorderliness of dread and nausea, much of the terminology used to describe them is incorrect, indicating their loud nature and reducing the model’s accuracy to 71.02%. During additional evaluation, we discovered that there are times when people have opposing viewpoints. In these situations, it is necessary to anticipate the outcome with a winning facial evaluation because single feelings are extremely obvious. Trohidis and co are one instance. To address the issue, researchers looked at four multi-name depiction designs and discovered that music can currently elicit a variety of emotions.
The Opinions Imparted on Singular’s Face
55
2 Methodology The term “convolutional neural network” (CNN) refers to the approach that is used to make a decision during image research. Convolutional layers, also known as hidden layers, distinguish a CNN from an MLP. A two-level CNN framework serves as the foundation for the proposed approach. Background removal, which is used to extract emotions from an image, is the recommended first level. The primary expressional vector (EV) is extracted in this instance using the standard CNN network module. The expressional vector (EV) is made via looking for facial centers that are critical and material changes in articulation that affect EV directly. Applying a basic perceptron unit to a face image with the background removed yields the EV. A non-convolutional perceptron layer makes up the proposed The Opinions Imparted On A Singular’s Face model’s final stage. The input data (or image) goes through transformations at each layer of the convolutional algorithm before moving on to the next level. The convolution operation is the transformation in question. Patterns can be identified using any of the used convolutional layers. There were four filters in each convolutional layer. Apart from the face, the first section of the CNN feedback image typically contains shapes, edges, surfaces, and other items for foundation evacuation. The first convolutional layer is started with the help of the edge, circle, and corner detector filters. After the face has been identified, the second segment of the CNN channel shows facial highlights like the eyes, ears, lips, nose, and cheeks. This layer’s use of edge detection filters. The second CNN’s layers have a kernel matrix that is three by three, like [0.25, 0.17, 0.9; 0.89, 0.36, 0.63; 0.7, 0.24, 0.82]. In the beginning, these numbers between 0 and 1 are chosen. The supervisory training dataset’s actual truth served as the basis for the optimization of these numbers for EV detection. The filter values were optimized using minimum error decoding in this particular instance. Before applying the filter to the background-removed face on the first-part CNN output image to identify various facial features like the eye and lips, supervisory learning is used to fine-tune the filter. Ears, nose, etc., 24 distinct facial features are extracted to create the EV matrix. The normalized Euclidian distances that separate each face component make up the entire EV feature vector.
3 Data Flow Emotional analysis based on facial expressions may not be accurate because facial expressions can vary slightly from person to person, mix different emotional states that are experienced simultaneously (for example, fear and anger, happiness and sadness), or not express any emotion at all. However, because a person’s face may not convey all of their emotions, relying solely on their appearance may lead to incorrect conclusions. The social and cultural context, as well as contextual clauses (sarcasm),
56
K. Pramilarani et al.
can make facial expressions more ambiguous. The quality of a captured facial expression can also be affected by technical factors like varying camera angles, lighting conditions, and the masking of multiple facial features. The Feelings Expressed On A Person’s Face does not explain the emotion’s trigger, which could be a thought of a recent or past event, so even if the results are accurate in identifying emotions, they may not provide a complete picture of the person. Even though the results of The Feelings Expressed On A Person’s Face may not be 100% accurate, they are typically regarded as facts and used as input into processes that have an effect on the life of a data subject, rather than triggering an evaluation to learn more about their situation in the specific context. This is the case even though the results of the test may not be 100% accurate. This despite the possibility that the experiment’s outcomes were not 100% accurate. Despite the results’ lack of precision, this is the case.
3.1 DFD Levels and Layers Using levels and layers, a data flow diagram can hone in on a single component in progressively greater depth. Levels in the DFD can range from 0 to 3 or even higher in some cases. The amount of detail required is determined by the scope of your objectives. DFD Level 0 is also called a Context Diagram. It is a fundamental overview of the entire modelled or analyzed system or procedure. It is intended to provide a brief overview of the system, presenting it as a single, high-level process that is interconnected to other things. It ought to be easy to understand for many people, including stakeholders, developers, business analysts, and data analysts (Fig. 1). DFD Level 1 offers a more in-depth breakdown of individual components of the Context Level Diagram. As you break down the high-level process of the Context Diagram into its subprocesses, you will highlight the system’s main functions (Fig. 2).
Fig. 1 Data flow at level 0
The Opinions Imparted on Singular’s Face
57
Fig. 2 Data flow at level 1
DFD Level 2 then delved further into the components of Level 1. Additional text may be required to provide the necessary level of information about the system’s operation. Although it is rare, it is possible to advance to Levels 3, 4, and beyond. This can lead to complexity that makes it hard to talk, compare, and model effectively. With DFD layers, the cascading levels can be nested right into the diagram, making it easier to get to the deeper dive and making the diagram look cleaner (Figs. 3 and 4).
4 Problem Statement Human facial expressions easily reveal seven fundamental emotions: happy, sad, surprised, afraid, angry, resentful, and neutral are some of the responses. We use specific facial muscle arrangements to express our emotions. These subtly complex expressions frequently contain a wealth of mental state information. Using facial emotion recognition, we can easily and inexpensively measure the effects that content and services have on users or audiences. Retailers could, for instance, use these metrics to gauge customer interest.
58
K. Pramilarani et al.
Fig. 3 Data flow at level 2
Fig. 4 Emotion classification level
Healthcare providers may be able to provide better service by making use of more information regarding the emotional state of their patients during treatment. By monitoring audience participation at events, entertainment producers can continuously produce the content they want. It has been taught to perceive emotions in other people; Babies even realize the distinction between being blissful and miserable at 14 months. But do computers understand emotions better than we do? In response to the question and to make it possible for machines to make decisions regarding our
The Opinions Imparted on Singular’s Face
59
local states, we developed a profound learning brain organization. In other words, we provide them with eyes that can see what we cannot.
5 Approach The proposed work comprises of three stages: face recognition, classification, and detection. Utilizing real-time bounding box coordinates to locate the individual and capture their face with a video camera is the first step. Face and Haar cascades are found in this step by utilizing the open CV library. The Haar overflow include and the Viola Jones calculation are utilized together to recognize an individual’s face. Landscapes, objects, and shapes are just a few of the features found in the detected images. Face features are extracted and stored in a database for face recognition during this phase. A CNN model that matches the face in the database and identifies the face based on its name. Faces from the informational index are perceived and taken a gander at using introducing vectors. Face detection, recognition, and classification are processed by the distribution platform using Python 3.5 and Anaconda software. Before a face is recognized, the CNN model training and testing database features and matches it. The image features of dlib and other libraries. Lastly, the recognized human face is categorized as angry, afraid, disgusted, happy, neutral, or surprised based on how it appears in real time. Using the CNN model, the VGG 16 network architecture recognizes and categorizes massive databases. There are 4,096 Softmax-organized center points in the two related layers of the arranged association model, which is a three-by-three-layer honeycomb structure. The nearby paired model histogram is utilized by the open CV library to distinguish human countenances. A threshold is used to identify the image’s pixels, and the end result is displayed as a binary number. For this purpose, LBPH makes use of radio, neighbors, Grid X and Y, and other parameters. As demonstrated in the preceding section, the architectures of the hybrid CNNRNN and CNN-LSTM methods are comparable. So, CNN-RNN’s straightforward engineering consolidates a LSTM with a DL programming visual component extractor, for example, the CNN model. As a result, the hybrid methods are equipped to differentiate between image sequences and emotions. Figure 5 depicts a variablelength or fixed-length vector and shows that each graphic attribute has been translated into LSTM blocks. The SoftMax classifier is utilized in a recurrent sequence learning module, and prediction performance is then presented (Fig. 6).
60
K. Pramilarani et al.
Fig. 5 Demonstrated CNN features used by LSTM for sequence learning
Fig. 6 Training process of CNN model for facial emotion recognition
6 Experimental Studies Using two benchmark datasets, this section investigates how well the proposed FER system integrates TL on DCNN. The experimental setup and benchmark datasets will be discussed first. In conclusion, comparing the proposed model’s benchmark dataset results to those of other well-known methods demonstrates its effectiveness.
6.1 Benchmark Dataset The problem of emotion recognition requires few datasets; Japanese Female Look (JAFFE) and Karolinska Coordinated Profound Faces (KDEF) are two notable
The Opinions Imparted on Singular’s Face
61
datasets that are the focus of this review. The seven distinct emotion categories are represented by the images in the datasets: Neutral, Afraid (AF), Anger (AN), Dissatisfied (DI), Sad (SA), Happy (HA), and Afraid (AF) are the responses. The datasets and the factors that went into their selection are briefly described. The Department of Clinical Neuroscience, Section of Psychology at the Karolinska Institute in Stockholm, Sweden, developed the KDEF dataset. The dataset will similarly be suggested as KDEF for the prosperity of straightforwardness. Because the photos were taken in a lab, they did not accurately depict the members’ emotions. The dataset was utilized for both the backward masking experiment and memory, emotion, and perception. Despite the fact that the material’s primary objective was not to classify emotions, emotion classification is popular for this type of task due to the fact that psychological and medical issues frequently involve emotions. There are 4,900 images of 70 people exhibiting seven different feelings in the dataset. The subject was viewed from five distinct perspectives, all of which resembled the frontal (strait) view, as well as four distinct profile views—full left, half left, full right, and half right. The precise upsides of the pictures range from + 90° (full right) to 90° (full left). In a full left or full right profile view, FER is more difficult due to the fact that only one eye and one ear are visible on one side of the face. a few illustrations of the KDEF dataset’s images. The array of profile and frontal images makes it challenging to carry out FER on the dataset. This review uses the entire dataset, which is crucial for modern applications, to evaluate the proposed method’s efficacy in such straightforward situations. Profile assumes the job of FER from various obscure positions. The dataset also contains a few studies, the majority of which are based on 980 frontal images.
6.2 Experimental Setup In this study, OpenCV is used to crop the face. The images were resized to the default input size of 224 × 224 pixels for the pre-trained DCNN models. The parameters of the Adam optimizer determine the learning rate: 0.0005, beta1:beta2 and 0.9:0.009. However, the following are the augmentation settings: we carefully augmented the data only minimally: the shift: 10° to 10°, and the scaling factor is as follows: 1.1, vertical flip by making such a minor change to the original image, accuracy is improved. The test and training sets were separated in the experiments in two distinct ways: (i) a CV with a tenfold fold and the random use of 90% of the images in a benchmark dataset from KDEF or JAFFE as the training set. The remaining 10% of images are used as the test set. In a 10-fold CV, the available images are broken down into ten sets that are either identical or nearly so. The remaining nine sets are used for preparation, and each set is used as a test set. Consequently, each run has an average of ten runs. Test set accuracy is a performance metric for any recognition system because it must correctly respond to unknown data. The model was trained with Python, Keras, and
62
K. Pramilarani et al.
a backend for Tensorflow. The trials were completed on a PC with 16 GB of Slam and a 3.5 GHz computer processor in the Windows climate.
7 Related Work Numerous conservative advancements have been made for Automatic FER systems. A feature vector for training is created using the geometric characteristics that can be seen lying on the location and viewpoint of 52° facial marker spots. Here, the essential perspective and the Euclidean distance between each pair of points in a system are determined, and then the distance and point values of the matching space and the point upsides of the essential edge in the record string are deducted. Two kinds of classifiers are used in this case: dynamic time warping and multi-class AdaBoost in conjunction with SVM on the enhanced feature vectors. Look features are usually mined on or after the same face area because different areas of the face have different styles of detail. Happy and others classified various facial expressions using Principal Component Analysis (PCA) and a Local Binary Pattern (LBP) histogram with various chunk ranges as a characteristic vector. Because it is unable to match the characteristic vector to the local differences in facial sections, this method’s precision is compromised even though it is utilized in an instantaneous setting. In various face regions, there are significant poles apart intensities. For instance, the eyes and mouth provide additional information in comparison to the cheek and forehead. To extract appearance features, Guimire et al. divided the entire face region using domain-specific local expanses. Important local regions were found using an incremental search strategy, which reduced feature dimensions and increased recognition accuracy. Numerous researchers have identified various feature extraction methods and classifiers for conventional approaches. Facial expression recognition is based on wellknown characteristic mining methods like Histogram of Oriented Gradients (HOG), Local Binary Pattern (LBP), distance and angle relation flanked by facial landmarks, and classifiers like Support Vector Machine (SVM), AdaBoost, and Random Forest. The fact that conservative approaches require less computing control than methods based on deep learning is a drawback. Because of their higher precision and lower computational burden, these methods are still widely used by humans on a regular basis. The field of deep learning has recently demonstrated tremendous promise in the field of computer vision, despite the fact that very little work has been done on facial expression recognition. For facial expression detection tasks, the Deep Belief Network, or DBN, has been extensively utilized. DBN allows for direct concern because there is no need for image pre-processing when analyzing a face image. Ranzato et al. are an illustration. analyzed facial expression discrimination using a gated Markov Random field method. In a large number of facial images, DBN also recognizes facial expressions. Additionally, whenever a deep structure framework and a Gabor Filter are used for feature extraction, it is applied to examples
The Opinions Imparted on Singular’s Face
63
of pretrained guiding data. The AU-aware deep network is another way to detect facial expressions. In this method, appearance differences are first described by a few limited facial Action Units (AU), and then DBN is used to find a few characteristics needed for ultimate facial expression recognition. Numerous researchers have recently discovered that facial expression recognition can perform well with Convolutional Neural Network (CNN) for candid and non-posed pictures by employing a multi-scale CNN strategy and integrating a Facial Coding System (FACS) through CNN. Another approach is to use a deep network made up of a lot of SVMs and a CNN and SVM classifier together.
8 Result and Discussions The initial performance evaluation of the algorithm was based on the extended Cohn– Kanade expression dataset. There were 486 sequences and 97 posers in the dataset, giving it a maximum accuracy of up to 45%. To overcome the issue of low efficiency, multiple datasets were downloaded from the Internet, and the author’s own pictures at various expressions were included. As more images were added to the dataset, the accuracy increased. The training portion of the 10 K dataset comprised 70% of the images, while the testing portion comprised 30%. For each of the 25 iterations, the various sets of 70% training data were utilized. The final error bar was calculated using the standard deviation. Both the face feature extraction CNN, which is the second component of the CNN, and the background removal CNN, which is the first component of the CNN, had the same number of layers and filters. This was finished to make things better. In this study, we varied the number of layers from one to eight. We discovered that around four represented the highest level of accuracy. It was difficult to comprehend because we assume that the number of layers is directly proportional to accuracy and inversely proportional to execution time. As a result, the four layers that provide the highest level of precision were chosen. The execution time, which increased with the number of layers, was not included in the current manuscript because it did not significantly improve our research. One to eight filters were tried once more for each of the four layer CNN networks. We came across four efficient filters. Consequently, The Feelings Expressed On A Person’s Face’s four layers and four filters were created. As part of the future scope of this study, researchers can independently test varying the number of layers for each CNN. By providing a distinct number of filters for each layer, one can also cut down on the amount of work required. This could be automated with servers. We were unable to carry out this study because the author lacked sufficient computing power. However, it would be greatly appreciated if other researchers could come up with a better number than four (layers) or four (filters) and an accuracy that was higher than our 96%. Skin tone detection was the only challenge presented by the images’ gray scale nature. Background removal with skin tone detection was made simple by colour images; However, we discovered numerous instances of face detection errors in gray scale images. Thanks to The Feelings Expressed On A Person’s Face and an EV
64
K. Pramilarani et al.
feature vector with 24 dimensions, we were able to correctly classify faces with a 30° orientation. We are aware of the potential side effects of facial hair and the method’s drawbacks, such as the need for a significant amount of computing power for CNN tuning. Our algorithm’s overall accuracy is very high (96%) despite these issues.
9 Conclusion The face’s ability to convey feelings is getting closer. We demonstrated that there are two types of these methods: there are three stages to the typical The Sentiments Communicated On An Individual’s Face approach: location of the face and facial parts, extraction of highlights, and order of demeanour. Similar to The Feelings Imparted On A Singular’s Face, individual computations such as SVM, Adaboost, and sporadic woods are also utilized; The Sentiments Communicated With Respect to An Individual’s Face approaches paradoxically reduce the reliance on face-physical science based models and other pre-handling methods by allowing “start to finish” advancement in the pipeline directly from the information pictures. A CNN’s performance on datasets and other tasks related to The Sentiments Communicated On An Individual’s Face demonstrates the capacity of organizations that focus on feeling recognition as a particular kind of profound learning. The infographics simplify the model derived from various The Sentiments Communicated On An Individual’s Face datasets. However, hybrid approaches have been proposed by joining a CNN for the spatial features of individual edges and a LSTM for the transient components because CNN setup The Feelings Spoke about A Singular’s Face methodologies are unable to account for the common assortments in the facial parts.
References 1. 2. 3. 4. 5. 6.
https://ieeexplore.ieee.org/abstract/document/8543 https://arxiv.org/abs/2012.00659 https://link.springer.com/article/https://doi.org/10.1007/s42452-020https://www.sciencedirect.com/science/article/pii/S1319 https://www.frontiersin.org/articles/https://doi.org/10.3389/fpsyg.2021 https://medium.com/analytics-vidhya/facial-expression-detection-using-machine-learning-inpython-c6a188ac765f 7. https://www.pxl-vision.com/en/blog/machine-learning-and-how-it-applies-to-facial-recogn ition-technology#:~:text=A%20deep%20CNN%2C%20on%20the,process%20is%20called% 20facial%20recognition 8. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6514572/#:~:text=Three%20feature%20extr action%20algorithms%2C%20Gabor,used%20for%20emotion%20intensity%20recognition
Abnormal Human Behavior Detection from a Video Sequence Using Deep Learning Muskan Sahetai, Bansari Patel, Radhika Patel, Ritika Jani, and Dweepna Garg
Abstract The utilization of cameras is becoming accepted society more frequently in a wide range of settings and contexts, including live traffic monitoring, vehicle parking, space surveillance, inside vehicles, and intelligent spaces. These cameras provide information concerning everyday activities that must be effectively examined. The preponderance of visual surveillance, regrettably, still relies on an operator to sort through this footage. In this paper, behavior of human is analyzed from a video sequence and accordingly the behavior is classified as dangerous or safe. Recurrent artificial neural networks are being used to evaluate those identify the features and foresee potential video activity, while throughout video sequences. Identifying the behavior of the person would help to take the necessary action on the immediate basis.
1 Introduction An abnormal pattern that deviates from the norm is called an anomaly or an outlier, and anomaly detection attempts to find these patterns. High dimensionality finds anomaly recognition more complex because as the number of features or attributes increases, so does the volume of data required to classify successfully, resulting in M. Sahetai · R. Patel · R. Jani Department of Information Technology, Devang Patel Institute of Advance Technology and Research (DEPSTAR), Faculty of Technology and Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, Gujarat, India B. Patel Department of Computer Science and Engineering, Devang Patel Institute of Advance Technology and Research (DEPSTAR), Faculty of Technology and Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, Gujarat, India D. Garg (B) Department of Computer Engineering, Devang Patel Institute of Advance Technology and Research (DEPSTAR), Faculty of Technology and Engineering (FTE), Charotar University of Science and Technology (CHARUSAT), Changa, Gujarat, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Choudrie et al. (eds.), ICT for Intelligent Systems, Smart Innovation, Systems and Technologies 361, https://doi.org/10.1007/978-981-99-3982-4_6
65
66
M. Sahetai et al.
data sparsity, where data points are more and more scattered and isolated. Extraneous factors, a high noise level from several minor features, or both can mask the underlying anomalies. The “curse of dimensionality” is a term used to describe this problem. It presents a challenge for numerous high-dimensional anomaly detection methods that fall short of maintaining the efficiency of traditional methods, including distance-based, density-based, and clustering-based methods. Artificial intelligence (AI) has a substantial area of focus called Human Behaviour Analysis (HBA). It can be used in a wide range of functions, such as intelligent retail environments, environment-assisted living, and video surveillance. Leading businesses in this sector are assisting in the rapid growth of human video data availability. From a DL viewpoint, this method is comparable to HBA. Deep learning approaches have made significant advancements in the classification context during the past few years as a result of increased processing power. For picture comprehension in convolutional neural networks (CNNs) are used, whereas random neural networks (RNNs) are used for short-term comprehension, such as with video or text. They are thoroughly discussed in the mentioned sections. The preceding is how the paper is positioned: the background study is presented in Sect. 2, and then Sect. 3 offers a thorough explanation of the architecture’s methodologies in use. In Sect. 4, the experiment, evaluations, and outcomes are presented followed by Sect. 5 which concludes the paper.
2 Background Study 2.1 Deep Learning Artificial intelligence (AI) and machine learning (ML) technique known as “deep learning” simulates how people learn specific types of information. Data science, which also includes statistics and predictive modeling, is fundamentally dependent on deep learning. Deep learning will greatly speed up and clarify the process of collecting, analyzing, and interpreting large amounts of data, which will be of great use to data scientists. At its most fundamental level, predictive analytics can be automated using deep learning. Deep traditional machine learning, algorithms that are stacked in a hierarchy of increasing complexity and abstraction, as contrasted to linear. A subset of machine learning known as “deep learning” creates systems that can recognize features from massive amounts of unlabeled training data by utilizing sophisticated, multi-level “deep” neural networks. Given that deep learning outperforms more conventional methods like decision trees and support vector machines over the past 10 years, there has been a surge in interest in employing it. Bayesian networks, etc. Consideration of the biologically inspired algorithms created decades ago is now possible thanks to the rise in computing power over the last few years.
Abnormal Human Behavior Detection from a Video Sequence Using …
67
2.2 Convolutional Neural Networks (CNN) Convolutional neural networks are a specific kind of artificial neural network designed to process large amounts of unprocessed data (such as images, audio, or video). Due to the large amount of input data, a typical Fully Connected (FC) network would be inherently wasteful for removing the function. In a general sense, CNN reduces looking at various details by data regions to identify specific traits. Filters (kernels) that function similarly, to the weights of the Fully Associated ANN are used to create CNNs. One filters with convolutions is distributed across all of the input areas for produce one result, which is the only difference between the weights of the FC and this approach. It is known as Local Receptive Fields, and limiting number of weights that CNN can understand is quite advantageous. The filter is moved across the input allows for the measurement of output. The production at the present time is then calculated by adding up all the goods. A popular method for lowering the number of parameters and the amount of computation in a CNN is to increase the size of the feature maps. The maximum or average role in the feature map is used by the pooling layer to minimize the areas. It operates independently (often after a convolutional layer). Combining numerous convolutional and pooling layers simultaneously is a very effective strategy for detecting features. At various network layers, it is possible to differentiate between simple and complex functionality because multiple kernel sizes can be implemented simultaneously. Parallelism was first implemented in Alex Net in 2012.
2.3 Artificial Neural Network (ANN) Figure 1 presents the architecture of artificial neural network. Given that deep learning outperforms more conventional methods like decision trees and support vector machines over the past ten there has been a rise in interest in using it in recent years.
2.4 Recurrent Neural Network (RNN) Each layer’s output is kept and fed into the system’s input throughout the functioning of an RNN in order to predict that layer’s output (Fig. 2). Hidden layers and outliers can be used as inputs in recurrent neural networks, or RNNs, an assortment of neural networks. Typically, they are as follows (Fig. 3): The loops in these systems, sometimes described to as recurrent neural networks (RNNs), are also what allow the data to persist. We get a streaming output (ot) and an associated streaming input (xt) each time we do this, with the latter acting as an additional input for the subsequent iteration. When it comes to modeling brief
68
M. Sahetai et al.
Fig. 1 Architecture of artificial neural network [1]
Fig. 2 Architecture of recurrent neural network [2]
Fig. 3 Subclass of neural network
temporal dependencies, simple RNNs work well. Extended Short-Term Memory Networks, a different kind of RNN, are being used (most of the time in real situations) when dealing with lengthy information sequences.
Abnormal Human Behavior Detection from a Video Sequence Using …
69
2.5 Long Short-Term Memory (LSTM) In deep learning, use of LSTMs, or long short-term memory networks, is created. Numerous recurrent neural networks (RNNs) are capable of learning long-term dependencies, particularly for tasks that involve sequence prediction. LSTM is able to process the entire data sequence thanks to its feedback connections, which extend beyond isolated data points like photos. Among its applications are speech recognition and machine translation. The LSTM variant of RNN performs remarkably on a variety of problems. An important part of an LSTM model is a memory cell known as a “cell state” that stays the same over time. The horizontal line that passes through the top of the diagram below represents the state of a cell. It could be compared to a conveyor belt on which data flow naturally and unaltered. Information may be able to enter and exit the cell through these gates. A pointwise multiplication function and a layer of sigmoid neural networks supports the mechanism (Fig. 4). Processing multiple short video clips at once becomes a computationally challenging task due to the fact that each video may contain a large number of frames, of which not all are meaningful. The following are the applications for LSTM networks can be found in the following fields: • • • • • • • • • •
Language simulation Automated translation Identification of handwriting Picture captions Employing attention models to create images Answering inquiries Conversion of audio to video Modelling of multifaceted music A voice machines Prediction of the secondary structure of proteins.
The primary goal of this paper is to create and put into practice an effective deep learning system capable of predicting and categorizing human behavior into two
Fig. 4 LSTM gates [3]
70
M. Sahetai et al.
groups, Safe Activity and Dangerous Activity, utilizing a combination of CNNs and RNNs architectures.
3 Methodology Many deep learning professionals would easily perceive video classification as being comparable to doing image classification N times overall, where N is the total number of frames in a video, given that videos can be regarded of as a collection of unique images. Video classification is more difficult than basic picture classification since we frequently presume that succeeding frames in a movie are related in terms of the semantic contents. If we can take advantage of videos’ temporal nature, we can improve the results of our actual video categorization. In image categorization work, we: • Add a picture to our CNN. • Obtain the forecasts from CNN. • The label with the highest related probability should be chosen. A basic video categorization strategy would be to: since a video is merely a collection of frames, • • • • •
Cycle over each frame in the video clip. Run the frame through the CNN for each frame. Classify each frame separately and apart from the others. The label with the highest related probability should be chosen. Write the output frame to disc with a label.
Let’s talk about the most fundamental and naive method for video classification now that we’ve demonstrated the necessity for video classification models to address the issue of human activity recognition. The issue is this: 1. The model is also picking up on the surrounding environment. Think about the case below. 2. Figure 5a demonstrates the images wherein persons are fighting. Figure 6a and b also relate to some sort of robbery, murder etc. So, in order for the model to distinguish between the correct activity we should provide it enough examples in order to classify the scene with reference to environmental context. Figure 6a indicates that man is injured and in Fig. 6b it is indicated that the woman is trying to save the person. This activity could be misinterpreted as robbery if we don’t provide enough reference examples. However, there is a drawback to this tactic. The issue is that the model’s predictions for each video frame may not always be accurate, causing them to change rapidly. This is because the model does not classify the entire video stream but rather just individual frames. Instead of classifying and resulting in improvement for a single frame, these issues can be easily fixed by
Abnormal Human Behavior Detection from a Video Sequence Using …
71
Fig. 5 a Fighting. b Fighting
Fig. 6 a Misinterpretation of behavior. b Misinterpretation of behavior
(a)
(b)
averaging results over 5, 10, or n frames. Doing this would effectively eliminate the flashing. The rolling average method is an easy way to get to the value of n after we have calculated it. Various types of video classification methods are: i. ii. iii. iv. v.
CNN in a single frame. Late fusion. Early fusion. LSTM in conjunction with CNN. Optical flow.
After evaluating the current, the a most recent development in various behavior detection systems, the methodology of the tools we’ll be using, and describing the costly process of downloading the two data sets, we begin by outlining our strategy.
72
M. Sahetai et al.
In this system, which integrates two different DL models, a CNN scans video frames to extract features, and an RNN reads these features to predict behavior. The scripting language Keras framework is used to plan this deep learning model, with the TensorFlow framework functioning as the backend. The pre-processing of the info is important to make sure that the DL model fits correctly before moving on to the training stage. Consistent with the state of the art in deep learning activity detection, a prior to predicting frame sequences with such a recurrent neural network, a convolutional neural network should be used to extract the features of the video frames. Other behavior recognition DL models exist, one among which may be a 3D CNN that creates use of an FC network. This technique involves simultaneously feeding the entire movie to a 3D CNN that can extract both picture and motion or time features. Then, a network of stock FCs is fed with all of these features. The issue with this approach is that it needs all the footage in order to forecast the operation.
3.1 Convolution Developing a 2D convolutional neural network that performs well at comprehending images and creating their features (a vector that encapsulates an image’s information) is a difficult undertaking. This is because selecting a suitable model is challenging, and training consumes a significant amount of time and data. As a result, a typical deep learning technique involves extracting the features from a pre-trained model before include them in the new model. To identify photos, numerous models have previously been trained. An annual challenge (ILSVRC) to evaluate object recognition and picture classification algorithms has been organized by the database. ImageNet since 2010. Alex. Net (2012), ZF Net (2013), VGG Net (2014), GoogLeNet (2014), and Microsoft ResNet are just a few of the DL models that have emerged from this competition since 2012 (2015). In this instance, a feed–forward neural network will be used for the classification process, is connected to the second block. The number of items that need to be categorized defines the feedforward neural network’s output size. This second component follows the same structure as any (ILSVRC) challenge. We have chosen Inception v3 model because of its high classification accuracy and low computational cost. Instead of building a pyramid of convolutions, Inception employs what they call “modules of inception,” which are groups of layers with non-sequential (one behind the other) flow. A number of convolutions of varying sizes are computed separately in such modules and then concatenated into a row. Using this method, more functionality can be extracted. Convolution layers of 1 × 1 are also used to reduce operations. The classification component is made up of a fully linked layer combined with a SoftMax output layer, the second part of the Inception network. RNNs are required when categorizing an image stream, such as a photograph, but this classifying method is useful when categorizing a single image.
Abnormal Human Behavior Detection from a Video Sequence Using …
73
3.2 Recurrent A recurrent neural network is the best deep learning method currently available for detecting a series identification of inputs such as text, audio, and video. Regardless of when it appears (basic RNNs have short-term memories and can only remember recent segments of the sequence). The function element of the network according to the state of the art. RNNs that can simulate data sequences and have internal loops that feed the network input. Long Short-Term Memory Networks Inception should be included in an LSTM network in this model. Given that the Inception v3 network’s feature vector is 2,048 bytes long, it is recommended that an LSTM layer of the same size recall each and every feature of the vector series (each LSTM cell will be fed with one feature). Following the LSTM layer is a completely connected 512 neuron layer.
3.2.1
Algorithm
The following steps are to be followed: Step 1: Dataset building • Videos showcasing both dangerous and safe activities activity is collected from the web by Kaggle. • The videos must then be labelled. • There are six classes of different activities. They are as follows: abuse, arrest, assault, burglary, fighting, normal. Step 2: Training • The dataset is split into two parts: 25% is used to test the model, and 75% is used to train the model. • Videos are converted into frames. • Each video’s label is extracted and kept in an array. • To create NumPy arrays for the inceptionV3 model, picture pixels are turned into NumPy arrays for each frame. • The Frame is then preprocessed, then using the inception-V3 model, features are extracted from the Frame. • To provide the classification for the video, the LSTM model has two hidden layers with sigmoid and relu activations, respectively, and two neurons in the output layer with softmax activations. Step 3: Testing • A single video is used for testing, and its frames are taken. • The image pixels are transformed into numpy arrays for each frame, and the inception-V3 model is then applied to the arrays.
74
M. Sahetai et al.
• After preprocessing the frame, features are extracted from it using the inception-V3 model. • The features are then saved in an array, transformed to a numpy array, and finally rearranged to take the shape of the input for an LSTM algorithm. The classification of the video file is created using the LSTM model’s predict class method.
4 Results and Discussion The results are displayed in Fig. 7. The accuracy of various behaviors is mentioned. Here a random video from test dataset is selected and the result for all the classes is displayed converting the video into a gif. That is accuracy for all activities are displayed at once in one video.
Fig. 7 Results
Abnormal Human Behavior Detection from a Video Sequence Using …
75
5 Conclusion and Future Work Having problems classifying videos is due to several factors, such as a lack of video dataset, poor precision, etc. The principal highlights of this essay include, adjusting data from many databases to fit a deep transfer of learning from a pretrained deep learning model Utilizing LSTM to learn the model (Inception V3) for our system neural networks with recurrence. Because every day the manual review of recordings is problematic. Such convention with a framework that would analyze the video with the greatest precision. Internet-based video its future includes search, security monitoring, identifying and removing reloaded copyright videos. Future work will concentrate on improving the system’s accuracy by leveraging the diversity that datasets such as activity net offer in order to develop a more powerful activity recognizing technology. It could be possible to use a better model, enhancing the fine-tuning procedure, altering the number of neurons, layers, learning speed, etc.
References 1. medium.datadriveninvestor.com/what-exactly-is-tensorflow-80a90162d5f1 2. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780 3. Mukherjee D, Bankoski J, Grange A, Han J, Koleszar J, Wilkins P, Xu Y, Bultje R (2013) The latest open-source video codec vp9-an overview andpreliminary results. In: Picture coding symposium (PCS). IEEE, pp 390–393.
Role of Deep Learning in a Secure Telemedicine System with a Case Study of Heart Disease Prediction Darshan Singh , Siddhant Thapliyal , Mohammad Wazid , and D. P. Singh
Abstract In this modern world where everything is over Internet and everything is available at our fingertips. However, our health system is still dealing with the physical presence even when a person needs just a routine checkup. There physical presence or real time interaction is required with the doctor. These days everything can be done by the machines itself. It is possible through deep learning where we train our system with the help of huge amounts of data. There a person, i.e., a patient can check his/ her health-related issue directly. Deep learning is a method where we detect the results or symptoms according to the previously fetched data. In this paper, we discuss about the use of deep learning-based secure scheme for a telemedicine system. Here we also get the security of the healthcare data due to the deployment of authentication and key establishment process. It can be useful in emergency medical conditions, i.e., especially in rural areas and disaster affected areas.
1 Introduction Telemedicine is the way of health checkup in which patient can interact with a doctor from any part of a globe and can discuss his problems over call or internet. It is an up gradation of traditional checkup scenarios where it was necessary for both patient and doctor to present physically when patient is dealing with some health-related issue. But in today’s scenario telemedicine is also looks like a tattered approach because it only changes the medium of interaction between patient and doctor, Not the approach. Because there might be certain emergency conditions can occur where doctor is not able to connect, or patient is out or reach ability at that point of time in that case concept of telemedicine fails. So, for that we require certain approach which can give the same accuracy and results as doctor as a primary remedy, and it will connect to the doctor whenever it is possible. Health sector is one of the most essential Sector of any country or area. And health related tasks need high accuracy D. Singh (B) · S. Thapliyal · M. Wazid · D. P. Singh Department of Computer Science and Engineering,Graphic Era Deemed to be University,Dehradun, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Choudrie et al. (eds.), ICT for Intelligent Systems, Smart Innovation, Systems and Technologies 361, https://doi.org/10.1007/978-981-99-3982-4_7
77
78
D. Singh et al.
and less response time to deal with any condition and it is almost impossible to implement when we are using totally human dependent structure, because it requires bulk of data as well as proper segregation according to the symptoms. Which will take almost some hundreds of years to do so. So here Deep learning comes into the picture in which using certain algorithms we can deeply analyze and can also predict the possible treatment or required analysis with high accuracy in shorter time duration. So, in this paper we will analyze and implement the telemedicine and deep learning together and will predict some of the best solutions that can improve our telemedicine scenario for the rural areas where neither physical doctor is available nor connectivity.
1.1 What Is Deep Learning? Deep Learning is a concept which tries to achieve human brain like decision making capabilities in the machines. Deep learning models are very much strong in terms of decision-making capabilities, as they can decide with lesser intervention of the human. Deep learnings models are also capable of doing work on the large amount data without making changes in it. They can handle thousands of attributes in a single data set. The base of Deep Learning is Neural Networks, and it is completely dependent upon the biological neurons if we take it in terms of working. Deep learning is an AI technique that permits machines to gain as a matter of fact similarly that people do. Deep learning is a focal segment in self-driving vehicles, permitting them to distinguish a stop sign or separate between a person on foot and a light post. It empowers voice control in purchaser hardware like workstations, tablets, TVs, and sans hands speakers. A machine model figures out how to perform arrangement undertakings straightforwardly from pictures, text, or sound in deep learning. deep learning models will accomplish best in class exactness, in any event, awe-inspiring human achievement at times. Models are prepared utilizing a tremendous assortment of named information and multi-facet neural organization designs.
1.2 What Is Telemedicine? Telemedicine is a term which is used to treat patient or providing them a medical response in such a scenario where doctors are not physically present in the same room or location. The concept telemedicine is useful when a patient needs quick medical checkup but at that point of time the physical presence of the doctor is not possible (Fig. 1). There are certain types of Telemedicine: 1. 2. 3. 4.
Teleradiology Telepathology Tele-dermatology Telepsychiatry.
Role of Deep Learning in a Secure Telemedicine System …
79
Fig. 1 Types of telemedicine
Telehealth alludes to a wide assortment of innovation and offices used to give patient mind and fortify the general medical services conveyance framework. Telehealth is particular from telemedicine in that it envelops a more extensive scope of online medical services offices. Notwithstanding medical services care, telehealth may apply to far off non-clinical administrations like supplier enrollment, authoritative gatherings, and proceeding with clinical instruction. Telehealth includes “observation, wellbeing advancement, and general wellbeing highlights,” as indicated by the World Health Organization.
1.3 Deep Learning in Telemedicine On a worldwide scale, the requirement for medical services is developing. The shortage of clinical experts to fill this hole has started revenue in profound learning and long-haul telemedicine arrangements. Telemedicine has been believed to profit the two patients and medical care suppliers monetarily if government laws and protection offices acknowledge it as a reimbursable cost. Distributed computing, AI, and telemedicine headways are setting up a worldwide worldview for medical care while additionally expanding the interest for these offices. Telemedicine has been utilized in the medical care area for over a century. Specialists have utilized telemedicine to help patients in zones where clinical work force are restricted since the development of significant distance contact techniques like the radio and phone. Telemedicine developed couple with headways in data innovations. With the coming of the Internet, an individual’s capacity to get to clinical consideration without going to a medical care foundation has limitlessly expanded. In 2016, 47% of the total population approached the Internet, an ascent of 4% from the earlier year (Taylor 2016). The expansion of Internet-competent PCs like work areas, note pads, cell phones, and tablets has significantly extended the readiness of people to get to telemedicine. Profound learning in medical care is utilized to furnish specialists and medical services
80
D. Singh et al.
laborers with a choice emotionally supportive network so they can treat patients utilizing telemedicine. It is a technique for sending clinical information through intelligent computerized correspondence to direct far off arrangements, clinical trials, methodology, and clinical experts. Telemedicine’s key objective is to decrease confusions and expenses by crossing over the distance among network and contact in the clinical area. The utilization of electronic informing and gadgets to convey medical services to patients without an in-person arrangement is known as telemedicine. Follow-up arrangements, therapy of persistent ailments, drug the board, master discussion, and an assortment of other wellbeing administrations that can be conveyed distantly through secure video and sound connections are for the most part normal uses for telemedicine advances.
2 Lierature Review Various Researchers has proposed various schemes in the same field of research among those researchers Luo et al. [1] discussed An ocular condition called retinopathy of prematurity has a very high incidence of blindness. Its prompt identification and treatment are extremely important due to its rising occurrence each year. In this research, a deep learning-based collaborative edge-cloud telemedicine system is presented to address the issue of the absence of timely and efficient fundus screening for preterm newborns in distant places, which can exacerbate the condition and even cause blindness. Deep learning methods are primarily employed in the proposed system to classify the processed pictures. Undersampling and resampling are used in our approach, which is based on ResNet101, to address the issue of data imbalance in the context of medical image processing. Another researcher Oguine et al. [2] discussed that Modern medicine has entered a new phase with the development of telemedicine as a method of healthcare delivery. Its rapid expansion in today’s society lends credibility to technological and artificial intelligence advancements. With a more comprehensive perspective on the usability of various Telemedical Innovations in boosting Virtual Diagnostic Solutions, this article conducts a descriptive research to widely examine AI’s applications in healthcare delivery (VDS). In this study, significant advancements in Deep Learning model improvements for virtual diagnostic solutions are further examined. It also emphasised more study on the potential of Virtual Diagnostic Solutions (VDS) and impending difficulties. In conclusion, this study provides a broad review of AI in telemedicine with a special emphasis on Deep Learning-based methods for Virtual Diagnostic Solutions. Rupa et al. [3] proposed a solution for multimedia security in which they proposed that During the epidemic, telemedicine-which includes sending medical data over the internet-and online doctor consultations have both grown in popularity. Because the patient’s medical records are likely to contain sensitive and private information, this raises questions regarding the security of their medical data. In this research, a deep learning-based chaotic logistic map-based technique to multimedia transformation is suggested. The integration of a light encryption function utilising a chaotic logistic
Role of Deep Learning in a Secure Telemedicine System …
81
map gives the proposed work its originality. Additionally, classification for spotting false medical multimedia data is done using the ResNet model. An interactive user interface and linear feedback shift register operations make it simple to utilise the system to safeguard medical multimedia data. The security attributes, such as confusion and diffusion, that are required for encryption cyphers are provided by the chaotic map. The suggested encryption technique is more secure and resilient since they are also very sensitive to input circumstances. The suggested encryption approach aids in protecting the video and picture data used in medicine. On the receiver side, the deep learning method Multilayer Perceptions (MLP) is utilised to categorise the medical data in accordance with the attributes necessary to create additional processes. When put to the test, the suggested work displays high levels of entropy and is effective in protecting medical data from various cyber-attacks. There is a neglected requirement for performing retinal screening tests on every single diabetic patient, and there are numerous undiscovered and uncertain instances of DR. The point of this examination was to make a solid demonstrative innovation that could be utilized to work on DR screening. Eyes of DR are shipped off an ophthalmologist for additional assessment. Alluding DR-influenced eyes to an ophthalmologist for additional evaluation and care would assist with easing back the movement of vision misfortune by considering more fast and exact analysis. With the ascent in pervasiveness of man-made brainpower (AI) in an assortment of fields and areas, scientists in the field of medication have begun to apply AI’s information taking care of and examination capacities to telemedicine.
3 Work Has Been Done in Industry IBM has developed a model called Watson Health Program, It is having the option to decipher a great many pages of clinical information in a flash, the product will likewise reach inferences that can be utilized for indicative examination and different purposes. National Institute of Health Resources presented a model of Archive programmed retinal imaging study, which may help in the recognition of certain methods of vision that are less related to diabetes. Primary assistive gadgets will be accessible to specialists. Scientists at MD Anderson has developed a model which can predict acute toxicities in patients receiving therapy for head and neck cancer, and helps in analyzing patients’ medical data to provide real-time insights for improving care and providing assistance for clinical decision support based on the data generated by the system. Welltok developed an artificial system that could provide real-time analysis of the doctor’s communication with their patients and provide consultations related to improvement in overall healthcare. This, in turn, could suggest better diagnostics, health plans, recommendations, and optimized healthcare experiences to their patients. Welltok had built a chat-bot, “concierge” that could help customers revamp their resource utilization, expenditure clarity, and low-cost healthcare solutions.
82
D. Singh et al.
Fig. 2 Architecture of proposed scheme
NASA being one of the first to use telemedicine technology to keep the astronauts in space for better healthcare of the astronauts in the space. NASA has developed an novel way of 3-Dimensional telemedicine with the help of holoportation. Holoportation, a technology developed by Microsoft, uses a high quality image of a person and traverse it to any part of the world an even to space in real-time. NASA’s team used teleportation for diagonostic, preventive healthcare of the astronauts.
4 Architecture of Proposed Scheme The architecture of the proposed scheme is depicted in Fig. 2 will be discussed. In this architecture, the peer to peer cloud server network (P2PCS), i.e., health servers have multiple capabilities like processing and analyzing the data through some deep learning model. Health server. H S j receives data from smart healthcare devices . S H Di in a secure way. Their secure data transmission happens through the established session keys i.e., . S K S H Di ,H S j , which are established via a mutual authentication process. There are different users .Uk like doctors, nurses and relatives of patients, which can also access the data from the servers in a secure way through the established session keys i.e., . S K S H Di ,Uk .
Role of Deep Learning in a Secure Telemedicine System …
83
5 Practical Implementation In this section, We have implemented a deep learning model to demonstrate the application of deep learning in the healthcare sector. In this the dataset [4] being used collected from UCI ML Repository. The number of rows in the provided dataset was having 13 different features likewise: Cholestrol level, Chest pain, Blood Pressure, Heart rate etc. We have trained a model using keras library of Python, in which learning rate was .0.001. The classifications were used was Categorical classification Model and Binary classification model for both testing and training. The accuracy achieved in classification model was .53.33% with the precision value of .74%. And for Binary classification model accuracy achieved was .88.33% with precison of .88% (Table 1, 2, 3 and 4).
Table 1 System specification Parameter Description Windows 10 home single language Version 22H2 19045.2546 AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx 2.00 GHz 12.0 GB (10.9 GB usable) 64-bit operating system, x64-based processor No pen or touch input is available for this display
Edition OS build Processor Installed RAM System type Pen and touch
Table 2 Software specifications Parameter
Description
Language used IDE Library used Dataset features
Python 3.9 Anaconda’s Jupyter notebook Keras, pandas, numpy, matplotlib, sklearn 13
Table 3 Categorical classification result Precision Recall Iteration 0 1 2 3 4
0.73 0.25 0.40 0.29 0.00
1.00 0.08 0.20 0.57 0.00
F1-score
Support
0.84 0.12 0.27 0.38 0.00
27 12 10 7 4
84
D. Singh et al.
Table 4 Binary classification result Precision Iteration 0 1
0.79 1.00
Recall
F1-score
Support
1.00 0.79
0.89 0.88
27 33
6 Conclusion and Future Scope To sum up, we are in the beginning phases of deep learning, a framework that can well past the capacity of manual techniques and surprisingly current innovation. The simplicity with which this innovation can be applied has made it be of extraordinary use in an assortment of fields. This paper shows the significance of its application in the field of medication. Telemedicine is an adaptable cycle with boundless development openings. An execution may include obliging more patients or tracking down the correct practice for a careful treatment. All of which would significantly affect certain individuals’ lives. It very well may be seen that telemedicine has gotten up to speed to the most recent man-made consciousness improvements. However there are still a few hindrances to survive. The main commitment would be the utilization of these investigations, and implementing the idea to keep contemplating approaches with the creation of this innovation more effective with the goal. It may be executed in far off zones and immature medical care offices. The joining of telemedicine and deep learning can possibly fundamentally improve the adequacy of medical care administrations. Since the point of telemedicine is to convey care to patients from specialists who are not actually present close by, profound learning improves the properties of telemedicine by permitting the framework the capacity to decide. With the implementation of various deep learning approaches the critical and quite expensive results can be analyzed for various diseases which are still incurable in far areas. And also a secure and deep learning-based lightweight framework for the telemedicine system can be implemented, which has capability to mitigate the potential cyber threats/ attacks.
References 1. Luo Z, Ding X, Hou N, Wan J (2023) A deep-learning-based collaborative edge–cloud telemedicine system for retinopathy of prematurity. Sensors 23(1) 2. Oguine OC, Oguine KJ (2022) AI in telemedicine: an appraisal on deep learning-based approaches to virtual diagnostic solutions (VDS). arXiv:2208.04690 3. Rupa C, Harshita M, Srivastava G, Gadekallu TR, Maddikunta PKR (2022) Securing multimedia using a deep learning based chaotic logistic map. IEEE J Biomed Health Inform 4. http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/ 5. Baker SB, Xiang W, Atkinson I (2017) Internet of things for smart healthcare: technologies, challenges, and opportunities. IEEE Access 5:26521–26544
Role of Deep Learning in a Secure Telemedicine System …
85
6. Buettner R, Clauß T, Huynh MT, Koser D (2020) RFID tracking and localization technologies in healthcare. In: 2020 IEEE symposium on industrial electronics applications (ISIEA). Langkawi Island, Malaysia, pp 1–5 7. Chakraborty A, Jindal M, Khosravi MR, Singh P, Shankar A, Diwakar M (2021) A secure IoTbased cloud platform selection using entropy distance approach and fuzzy set theory. Wirel Commun Mobile Comput 2021:1–11 8. Chauhan H, Kumar V, Pundir S, Pilli ES (2013) A comparative study of classification techniques for intrusion detection. In: 2013 international symposium on computational and business intelligence. IEEE, pp 40–43 9. Deebak BD, Al-Turjman F, Aloqaily M, Alfandi O (2019) An authentic-based privacy preservation protocol for smart e-healthcare systems in IoT. IEEE Access 7:135632–135649 10. Fan K, Zhu S, Zhang K, Li H, Yang Y (2019) A lightweight authentication scheme for cloudbased RFID healthcare systems. IEEE Netw 33(2):44–49 11. Fan M, Zhang X (2019) Consortium blockchain based data aggregation and regulation mechanism for smart grid. IEEE Access 7:35929–35940 12. Garg N, Wazid M, Das AK, Singh DP, Rodrigues JJPC, Park Y (2020) BAKMP-IoMT: design of blockchain enabled authenticated key management protocol for internet of medical things deployment. IEEE Access 8:95956–95977 13. Jayaraman PP, Forkan ARM, Morshed A, Haghighi PD, Kang YB (2020) Healthcare 4.0: a review of frontiers in digital health. WIREs Data Min Knowl Disc 10(2):e1350 14. Latif S, Qadir J, Qayyum A, Usama M, Younis S (2021) Speech technology for healthcare: opportunities, challenges, and state of the art. IEEE Rev Biomed Eng 14:342–356 15. Qadri YA, Nauman A, Zikria YB, Vasilakos AV, Kim SW (2020) The future of healthcare internet of things: a survey of emerging technologies. IEEE Commun Surv Tutor 22(2):1121– 1167 16. Sharma S, Ghanshala KK, Mohan S (2019) Blockchain-based internet of vehicles (IoV): an efficient secure ad hoc vehicular networking architecture. In: 2019 IEEE 2nd 5G world forum (5GWF), pp 452–457 17. Thapliyal S, Wazid M, Singh DP (2023) Blockchain-driven smart healthcare system: challenges, technologies and future research. In: Choudrie J, Mahalle P, Perumal T, Joshi A (eds) ICT with intelligent applications. Springer Nature, Singapore, pp 97–110 18. Thapliyal S, Wazid M, Singh DP, Das AK, Alhomoud A, Alharbi AR, Kumar H (2022) ACMSH: An efficient access control and key establishment mechanism for sustainable smart healthcare. Sustainability 14(8) 19. Wazid M, Das AK, Lee JH (2019) User authentication in a tactile internet based remote surgery environment: security issues, challenges, and future research directions. Pervasive Mob Comput 54:71–85 20. Wazid M, Das AK, Rodrigues JJPC, Shetty S, Park Y (2019) IoMT malware detection approaches: analysis and research challenges. IEEE Access 7:182459–182476 21. Wazid M, Das AK, Shetty S, Rodrigues JJ, Guizani M (2022) AISCM-FH: AI-enabled secure communication mechanism in fog computing-based healthcare. IEEE Trans Inf For Secur 18:319–334 22. Wazid M, Singh J, Das AK, Shetty S, Khan MK, Rodrigues JJ (2022) ASCP-IOMT: AIenabled lightweight secure communication protocol for internet of medical things. IEEE Access 10:57990–58004 23. Wazid M, Thapliyal S, Singh DP, Das AK, Shetty S (2022) Design and testbed experiments of user authentication and key establishment mechanism for smart healthcare cyber physical systems. IEEE Trans Netw Sci Eng
Comparative Analysis of Chronic Kidney Disease Prediction Using Supervised Machine Learning Techniques K. Poorani and M. Karuppasamy
Abstract Chronic Kidney Disease (CKD) is one of the chronic microvascular complications of diabetes prevailing around the world in large numbers. Diabetic people are more affected with chronic kidney disease due to improper maintenance of blood glucose levels, pressure control, lifestyle habits and so on. Kidney treatment involves dialysis which is not affordable by all kinds of people. Inorder to avoid risk it is better to find this at earlier stage in which machine learning algorithms can be employed. The proposed work employs several supervised machine learning algorithm for classification. Supervised machine learning algorithms like Logistic Regression, Support Vector Machine, Gaussian naive Bayes, k Nearest Neighbour and Random Forest has been used for comparative analysis. The comparative analysis shows that the Gaussian naive Bayes shows higher accuracy of 96% on comparison with other supervised machine learning algorithms.
1 Introduction Diabetes nephropathy is a major microvascular complication of diabetes. Diabetic nephropathy is a condition in which the nephrons in the kidneys fail to dispose the waste fluids from the body. This damages the filtering system in the kidney thus leading to major complications. Progression of this condition leads to End Stage Renal Disease (ESRD) which is also known as kidney failure. Maintenance of blood glucose level and blood pressure are the major factors to be controlled to maintain kidney health [1]. Diagnosis can be done using urinary albumin test, creatinine ratio, glomerular filtration rate (GFR) which helps to determine the condition of the functioning of kidneys [1]. According to International Diabetes Federation diabetes combined with hypertension are the major reasons for kidney failure in 80% cases
K. Poorani (B) · M. Karuppasamy Department of Computer Applications, Kalasalingam Academy of Research and Education, Srivilliputhur, Tamilnadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Choudrie et al. (eds.), ICT for Intelligent Systems, Smart Innovation, Systems and Technologies 361, https://doi.org/10.1007/978-981-99-3982-4_8
87
88
K. Poorani and M. Karuppasamy
globally. Diabetes and chronic kidney disease are much closely related to cardiovascular disease. Based on report from UK and USA, 40% of diabetes tends to have chronic kidney disease [2]. Gheith et al. [3] demonstrated that the nephropathy begins to develop within 10 years of diabetic duration. They also highlighted many factors like hyperglycemia, obesity, hypertension, insulin resistant, dyslipidemia, hypovitaminosis D and genetic loci among which hyperglycemia is an important feature to be considered. Hussian et al. [4] highlighted the prevalence of DKD in low to middle income countries which also leads to high mortality rate. They also discussed several factors like albuminuria, smoking, cholesterol and certain biomarkers responsible for DKD. Zhou et al. [5] investigated type 2 diabetic people and made analysis on microalbuminuria, healthcare cost and so on. Aljaaf [6] examined a list of factors like glomerular filtration rate (GFR), age, diet habits, status of current medical conditions and albuminuria which are used to predict the kidney disease but several other factors are also needed for better clinical decision. Stroescu et al. [7] examined various modifiable and non-modifiable factors responsible for the prediction of kidney disease. They also investigated several risk factors like genetic factors, high systolic rate, high albumin urinary excretion, smoking, renal function decline, endothelial dysfunction and so on. Even though those factors are summarized, the inclusion of arterial hypertension and presence of urinary tract infection is also suggested for better prediction.
2 Related Work Chan et al. [8] made a review on machine learning models for the prediction of kidney disease and came out with the decision that machine learning and artificial approaches are much need for disease prediction and drug discovery. Arjaria et al. [9] in their study performed various feature selection techniques like gini index, information gain and Chi square. They also performed various machine learning algorithms like, ANN, kNN, SVM, Adaboost, Random forest and naive Bayes and found a better prediction model with all the evaluation metrics. Feature selection methods helps to achieve accuracy above 96%. Masseroli et al. [8] studies on the supervised machine learning methods for kidney disease prediction using a randomized tree model with 27 features and achieved an accuracy of 94%. Jansi Rani and Karuppasamy [10] performed classification using support vector machine for microarray data which outperforms the classification. Arjaria et al. [9] employed machine learning algorithms for prediction and also utilized various feature selection techniques so as to achieve higher accuracy. Ventrella et al. [11] employed supervised machine learning techniques for kidney disease prediction and got a good accuracy of 94% while using randomized trees classifier. Ghosh et al. [12] analysed various SVM, LDA, GB, Adaboost techniques and came out with good accuracy while suggesting optimization algorithms helps to better predict the outcome. Rani et al. [13] in their work highlighted the bacterial foraging optimization technique for feature selection from microarray data. Thus optimization techniques can be utilized
Comparative Analysis of Chronic Kidney Disease Prediction Using …
89
for better prediction of outcome. Feature selection and optimization technique helps in improving the classification accuracy of the machine learning algorithms. Alam et al. [14] in their findings highlighted the importance of feature ranking to find the priority range of the features in the given dataset.
3 Proposed Work A model is said to be good if the features contributing to the development of the predictive model are well organized and selected. This work concentrates on feature selection and comparative analysis on the supervised machine learning models. Since the dataset employed for the classification is a labeled data we use supervised algorithms like Logistic regression, SVM, Random Forest, naive Bayes and kNN. All above models are employed on the CKD dataset and comparative analysis has been done. Feature Selection has been done with Information gain (IG). The features most responsible for the onset of kidney severity has been selected and further processed for classification which gives better results. IG measures the entropy reduction. Entropy informs the amount of information in a random variable. Entropy = −(p(0) ∗ log(P(0)) + p(1) ∗ log(P(1))) Figure 1 represents the proposed workflow. Chronic Kidney Disease dataset is taken from kaggle repository. This dataset consists of the factors responsible for chronic kidney disease prediction which has been employed with supervised machine learning algorithms. Table 1 represents chronic kidney disease dataset which consists of 11 noncategorial variables and 13 categorial variables for classification.
3.1 Logistic Regression (LR) LR is a supervised classification algorithm employed for prediction and classification. Here logistic regression algorithm has been applied to find whether a person has kidney disease or not. LR is an extension of linear regression where the linear equation has been applied with the sigma function in order to classify the data that are adjacent to the hyperplane. Figure 2 represents logistic regression model with threshold value and sigma curve in which the logistic regression works well when compared to linear regression. The number of variable tends to follow a good path while using sigma function implemented in logistic regression.
90
K. Poorani and M. Karuppasamy
Fig. 1 Proposed workflow
Chronic Kidney disease
Preprocessing
Feature Selection
Supervised machine learning
Random forest Logistic Regression Naïve Bayes SVM KNN
Performance Evaluation
3.2 Random Forest (RF) RF is one of the classification algorithm employed to find the classification of disease. It consists of several decision trees on various subsets of given dataset and uses average to improve prediction accuracy. Figure 3 indicates random forest follows multiple decision tree inorder to get the result. The majority voting technique has been employed to get the prediction result from the decision trees which provides better results for random forest.
3.2.1
Guassian Naive Bayes
Naive Bayes is a probabilistic algorithm based on Bayes theorem. This classifies the dataset on the basis of strong assumptions. Gaussian naive Bayes supports continuous valued features. NB classify the dataset on the prevalence of specific feature in class. Gaussian approach follows a Gaussian distribution with no co-variance between the dimensional variables. The above mentioned Fig. 4 represents Bayes theorem and the probability of the variables has been considered for the analysis of the outcome. The Gaussian distribution for naïve Bayes is the most considerable form which yields better results.
Comparative Analysis of Chronic Kidney Disease Prediction Using … Table 1 CKD dataset description
Fig. 2 Representation of logistic regression
91
S. no
Attributes
Type
Description
1
Age
Numeric
Age
2
Bp
Numeric
Blood pressure
3
Sg
Nominal
Specific gravity
4
Al
Nominal
Albumin
5
Su
Nominal
Sugar
6
Rbc
Nominal
Red blood cells
7
Pc
Nominal
Pus cells
8
Pcc
Nominal
Pus cell clumps
9
Ba
Nominal
Bacteria
10
Bgr
Numeric
Blood glucose random
11
Bu
Numeric
Blood urea
12
Sc
Numeric
Serum creatine
13
Sod
Numeric
Sodium
14
Pot
Numeric
Potassium
15
Home
Numeric
Hemoglobin
16
Pcv
Numeric
Packed cell volume
17
Wc
Numeric
White cell count
18
Rc
Numeric
Red blood cell count
19
Htn
Nominal
Hypertension
20
Dm
Nominal
Diabetes mellitus
21
Cad
Nominal
Coronary artery disease
22
Appet
Nominal
Appetite
23
Pe
Nominal
Pedal edema
24
Anc
Nominal
Anemia
25
Class
Nominal
Class
92
K. Poorani and M. Karuppasamy
Fig. 3 Representation of Random forest involving multiple decision tree Fig. 4 Representation of Bayes theorem
3.2.2
Support Vector Machine
Support vector machine is a bi-classification algorithm. This follows the result based on decision boundary for classification. This can be used for regression and classification. The classification is being done by extending the margin of a hyperplane between the classes. Figure 5 demonstrates how the hyperplane has been utilized for classification with the margins expanded. It also follows support vectors which better suits the results.
3.2.3
K Nearest Neighbour
K Nearest Neighbour is used for supervised machine learning. This works by selecting the distances between a query and all others in the data and also selects a specific number (K) close to the query. The frequent labels are used for classification and average labels in case of regression. Figure 6 explains the value differs for K. Thus the optimal K value can be easily identified and utilized for classification.
Comparative Analysis of Chronic Kidney Disease Prediction Using …
93
Fig. 5 Representation of optimal hyperplane in SVM Fig. 6 Representation of K value with various ranges
3.2.4
Confusion Matrix
Confusion matrix is a used to describe the prediction results of classification. The number of correct and incorrect predictions given in the matrix form which helps to find the accuracy. Figure 7 represents the confusion matrix where their representations. Evaluation of all the above models has been done with confusion matrix to get the accuracy of all models. The accuracy calculation has been defined: Accuracy =
(TP + TN) (TP + FP + TN + FN)
The accuracy of the various supervised machine learning algorithm for the prediction of chronic kidney disease has been shown below in Fig. 8. Supervised machine learning algorithms like logistic regression, Gaussian naive Bayes, kNN, support vector machine and random forest are utilized in the proposed work. Gaussian naive Bayes seems to perform well in terms of predicting chronic
94
K. Poorani and M. Karuppasamy
Fig. 7 Representation of confusion matrix
Accuracy Percentage
Comparative Analysis 100 95 90 85 80 75
Accuracy
Supervised Algorithms Fig. 8 Comparative analysis of supervised algorithms for CKD
kidney disease. Machine learning algorithms helps in predicting the kidney disease at earlier stages which may help to reduce the cases leading to severe kidney damage or end stage renal diseases.
4 Conclusion and Future Discussion This proposed work makes use of several supervised machine learning algorithms. This work concludes that machine learning algorithms are better suited for classification of kidney disease prediction while applying feature selection techniques. Out of various supervised algorithms Guassian naive Bayes algorithm has been suggested to provide high accuracy. Other such supervised and ensemble algorithms can be used further used in kidney disease prediction in terms of recall and precision.
Comparative Analysis of Chronic Kidney Disease Prediction Using …
95
5 Limitations The proposed work involves five supervised learning algorithm, while ensemble techniques may be used to provide higher accuracy. Deep learning concepts can work efficiently which needs to be studied.
References 1. www.mayoclinic.org 2. www.idf.org 3. Gheith O, Farouk N, Nampoory N, Halim MA, Al-Otaibi T (2016) Diabetic kidney disease: worldwide difference of prevalence and risk factors. J Nephropharmacol 5(1):49 4. Hussain S, Jamali MC, Habib A, Hussain MS, Akhtar M, Najmi AK (2021) Diabetic kidney disease: An overview of prevalence, risk factors, and biomarkers. Clinical Epidemiology and Global Health 9:2–6 5. Zhou Z, Chaudhari P, Yang H, Fang AP, Zhao J, Law EH, Wu EQ, Jiang R, Seifeldin R (2017) Healthcare resource use, costs, and disease progression associated with diabetic nephropathy in adults with type 2 diabetes: a retrospective observational study. Diabetes Therapy 8(3):555–571 6. Aljaaf AJ (2018) Early prediction of chronic kidney disease using machine learning supported by predictive analytics. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Wellington, New Zealand 7. Stroescu AE, Tanasescu MD, Diaconescu A, Raducu L, Balan DG, Mihai A, Tanase M, Stanescu II, Ionescu D (2018) Diabetic nephropathy: A concise assessment of the causes, risk factors and implications in diabetic patients. Rev Chim 69(11):3118–3121 8. Chan L, Vaid A, Nadkarni GN (2020) Applications of machine learning methods in kidney disease: hope or hype? Curr Opin Nephrol Hypertens 29(3):319 9. Arjaria SK, Rathore AS, Cherian JS (2021) Kidney disease prediction using a machine learning approach: A comparative and comprehensive analysis. In: Demystifying big data, machine learning, and deep learning for healthcare analytics, pp 307–333. 10. Jansi Rani M, Karuppasamy M (2022) Cloud computing-based parallel mutual information for gene selection and support vector machine classification for brain tumor microarray data. NeuroQuantology 20:6223–6233 11. Ventrella P, Delgrossi G, Ferrario G, Righetti M, Masseroli M (2021) Supervised machine learning for the assessment of chronic kidney disease advancement. Comput Methods Programs Biomed 209:106329 12. Ghosh P, Shamrat FJ, Shultana S, Afrin S, Anjum AA, Khan AA (2020) Optimization of prediction method of chronic kidney disease using machine learning algorithm. In: 2020 15th international joint symposium on artificial intelligence and natural language processing (iSAINLP). IEEE, pp 1–6 13. Rani MJ, Karuppasamy M, Prabha M (2021) Bacterial foraging optimization algorithm based feature selection for microarray data classification. In: Materials Today: Proceedings. 14. Alam MZ, Rahman MS, Rahman MS (2019) A random forest based predictor for medical data classification using feature ranking. Inform MedUnlocked 15:100180 15. De Boer IH, Afkarian M, Rue TC, Cleary PA, Lachin JM, Molitch ME et al (2014) Renal outcomes in patients with type 1 diabetes and macroalbuminuria. J Am Soc Nephrol 25:2342– 2350 16. US Renal Data System (2004) USRDS annual data report: atlas of end-stage renal disease in the United States. National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 17. Dataset taken from https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease
Prediction of PCOS and PCOD in Women Using ML Algorithms M. J. Lakshmi, D. S. Spandana, Harini Raj, G. Niharika, Ashwini Kodipalli, Shoaib Kamal, and Trupthi Rao
Abstract Polycystic ovarian disorder (PCOD) is an endocrine disorder resulting in hormonal imbalances and the production of the androgen hormone is notably heightened. Polycystic ovary syndrome (PCOS) is an endocrine disorder which is mostly found in women after puberty. The dataset used in our research work consists of 541 patients instances with 45 attributes related to the disease. It is collected from UCI ML depository. In this paper, we employed various ensemble learning algorithm like Random Forest, Bagging classifier, AdaBoosting and Gradient Boosting. Our model inferred the prediction of PCOD in young women with the highest performance of 91.7% through Gradient Boosting having F1 score of 92%.
1 Introduction Polycystic ovarian disease (PCOD) is usually perplexed or used conversely with PCOS, but they are slightly different from each other. PCOD is a common endocrine disorder resulting in hormonal and the production of the androgen imbalances hormone is notably heightened. Hormonal imbalance causes enormous secretion of male hormone. Complex problem in PCOD includes ovarian cysts and anovulation. PCOD generally affects women at the age where they are able to reproduce. It is noticed to be genetic in origin. Symptoms are found to start early but the detection is hardly done at early stage [1]. M. J. Lakshmi (B) · D. S. Spandana · H. Raj · G. Niharika · A. Kodipalli · S. Kamal · T. Rao Department of Artificial Intelligence and Data Science, Global Academy of Technology, Bengaluru, India e-mail: [email protected] A. Kodipalli e-mail: [email protected] S. Kamal e-mail: [email protected] T. Rao e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 J. Choudrie et al. (eds.), ICT for Intelligent Systems, Smart Innovation, Systems and Technologies 361, https://doi.org/10.1007/978-981-99-3982-4_9
97
98
M. J. Lakshmi et al.
Polycystic ovary syndrome (PCOS) is a chronic disease that puts the patients at a risk of havingmental disorder such as feeling of restlessness, nervous, depression, lesser sexual satisfaction, and lowered quality of health, etc. It is also a common endocrine disorder which is mostly found in women after puberty. PCOS is also primarily noticed to be genetic in origin. The evaluation done internationally has specified that there are 10–20% chances of being unveil to PCOS, and is generally diagnosed during adolescence [2]. PCOS may be judged by inspecting the people who fitted in any two of the following three: (1) clinical and/or biochemical signs of hyperandrogenism, (2) polycystic ovaries by ultrasound, (3) oligo- and/or anovulation. The clinical composition of PCOS is sundry, the main advanced indications such as irregularity in menses, obesity, not being able to conceive, excessive hair growth around face, chest, mouth, and acne. PCOS is also an abiding disease for which cure may not be found that follow signs of damaged insulin sensitivity, which leads to escalated danger of cardiovascular issues, such as dyslipidemia, arthrosclerosis, and type II diabetes mellitus. Metabolic and physical changes of PCOS patients like secretion of androgens, swelling or reddened condition in the body parts, etc. straightaway results in psychological illness in the patients, namely impatient, lack of interest, feeling sad, and disturbance in the eating behavior. The concern of infertility accounts to the mental pressure of PCOS patients that is loss of motherhood and womanhood, which results in prolonged anxiousness and depression [3]. PCOS is found to be a metabolic syndrome, although PCOD is not a severe disease and can be improved with proper exercises and diet [4]. Women diagnosed with PCOD have a less chance of being infertile and can be cured on seeking medical advices but greater risk of miscarriage and infertility are observed in women with PCOS. Ovaries contain about 12–15 cysts in PCOS women, whereas ovaries contain 5–8 cysts in PCOD women.
2 Literature Survey 2.1 Prediction and Symptoms of PCOS and PCOD As per the report Menstrual abnormalities were acknowledge by two third of women population in world. Therefore, women suffering from irregular menstrual cycle are more prone to PCOS. Studies carried on PCOS and PCOD patients clearly indicates that no periods and irregular periods for long time can lead to PCOS and PCOD. Women at the age of 20–30 years have a high risk of developing PCOS at their puberty while PCOD is more likely to be seen in women aging more than 30 years [5]. Women’s generally show some symptoms during pre-PCOS which include gaining weight, low volumed hair and increased hair fall and dull skin or acne [6]. Many physical changes like hair growth on chest, back or buttock and face also referred as Hirsutism. People around age 35–45 years face difficulty in getting pregnant, even if they get conceived, chances of miscarriage is more. Through classic segregation
Prediction of PCOS and PCOD in Women Using ML Algorithms
99
analysis report it was found that PCOD is due to some gene impairment and this can be hereditary. Although many clinicians confirmed women suffering from PCOS for more than 2 years have generality mental health concerns like anxiety, stress, overthinking, repetitive negative thinking and mood swings. Severe depressive symptoms are seen in women dealing with PCOS. Impact of PCO’s in women’s mental health is severe which includes reduction in social wellbeing, sexual dissatisfaction, mood disturbances, least efficient in decision making and reduced self-confidence. Exiled levels of hyper-androgenism and unusual or lack of ovulation suffer from PCOD. Therefore, at severe stages of PCOD it causes infertility in woman. According to Dermatologist report women’s with PCOS or PCOD have cosmetic problems like acne, oily skin, Acanthosis nigricans and baldness. PCOD patients even suffer from obstructive sleep apnea or disturbed sleep and altered sleeping patterns. These patients are stress eaters, emotional craving is more often in PCOS patients. Metabolic complication like insulin variation, hypertension and irregularity caused in body fitness and Improper glucose regulation accounts for accumulation of non-alcoholic fat in liver are common in PCOD patients [7]. Vitamin D deficiency seen in PCOD patients in turn have serious repercussions leading to obesity. Prolonged vitamin deficiency is accompanied by severe depression symptoms [8].
2.2 Impact of Stress and Anxiety in Women Suffering from PCOS and PCOD Survey was conducted which includes 732 women aging 18–40 years suffering from PCOS and PCOD, on an average most of them suffer from anxiety and stress and mental health issues. It is observed that most adolescents with PCOS struggle are more certainly to deal with anxiety and depressive symptoms. Women facing infertility issues due to PCOS are often reported with stress issues and they have low self-confidence. As per the meta-analysis women with PCOS have five times chances of developing anxiety symptoms and three times chances of developing depression symptoms than that of a common woman [9]. From meta-analysis of nine studies, it was interfered as an average 40% of PCOD women faced anxiety and stress issues [10]. Study on depressed women confirmed that PCOD symptoms are more severe in them when compared to non-depressed women. It also suggested that some somatic symptoms during PCOS like cosmetic changes, obesity and baldness causes stress in these women, heightened stress further leads to severe depressive symptoms in turn burdens the mental health of an individual [11]. Reduction in psychological functions in PCOS women accounts for depression [12]. As per the report, metabolic complications and increased weight adds up the depressive symptoms which later have negative impact on the mental health of PCOS patients. Due to PCOS one show varied glycemic levels and lipid composition in an individual which also declines the mental health. Menstrual disturbances and low
100
M. J. Lakshmi et al.
nutrient level escalates the chances of anxiety and restlessness behavior. Deterioration in health status and worsening metabolic activities in these women result in anxiety attacks and further contributes to severe depression [13]. However prolonged effect of stress and anxiety leads to many coronary heart diseases. Women with PCOS in their pregnancy show high androgen level which further increases mood disorders in the offspring. Several findings on PCOS women suggested that elevated emotional feelings during pregnancy accounts for depression. Among one in 20 pregnant women go through major depressive disorder [14]. Many reports predicted that depression level in women causes neuropsychiatric disorders and this is mostly seen in pregnant ladies with PCOS and PCOD [12].
2.3 Effects on Health in PCOS Patients As per the inference collected from the report PCOS patients upon prolonged effect suffer from eating disorders, diabetes, non-alcoholic fat deposition in liver, hypertension disorders seen in pregnant women and complications in delivery, i.e., miscarriage or Opting for caesarean surgery at the time of delivery. Statistically as an average 30% of women with PCOS and PCOD suffer from eating disorder [15]. Women with PCOS and PCOD have emotional cravings which imbalances their diet, causing obesity, reducing the nutrient level further accomplished by eating disorders like binge eating disorder, rumination disorder, Bulimia nervosa and Anorexia nervosa [16]. Disordered eating patterns and exercising patterns impacts on body’s hormonal production in turn causes eating disorder in an individual [17]. However pregnant women with PCOS have no particular diet, they do follow healthy balanced diet with less intake of refined sugar and saturated fats, but still gain weight and face some eating disorder to an extent [17]. The current study demonstrated on melatonin level in PCOS women inferred that sleeping time and depressive symptoms are dependent on melatonin secretion in an individual and in long run this adversely accumulates fat and lipids in various glands of PCOS patients [18]. Many clinicians carried glucose tolerance test, inferred that PCOS patients show variation in glucose regulation [16] accounting for non-alcoholic deposition in liver [19]. Adolescents with PCOS are recorded with diabetes (II) and this happens due to varied insulin level and low metabolic activities. Based on WHO report in 1999, 25% of women with PCOS at pregnancy are diagnosed with gestational diabetes mellitus and later WHO reported 40% women with GDM in 2013. PCOS symptoms like low glycaemic levels and regulation of insulin level also accounts for Diabetes Mellitus, on prolonged period this accounts for diabetes even in the offspring [15]. As per the gynaecologist latest reports women with PCOS have hypertensive disorders, macrosomia, increased BMI at conception is associated with severe pregnancy conditions. Women with PCOS conceive rarely even if they conceive it is accompanied with risk factors like miscarriage and hypertension disorders. Studies carried on pregnant women with PCOS are found with high BMI. Women with PCOS show high
Prediction of PCOS and PCOD in Women Using ML Algorithms
101
risk to undergo normal delivery, in most cases their gynecologist opt cesarean section at the time of delivery [20]. Many complications associated with cesarean section are high chances in the infant death and abnormality in infant. During post-pregnancy there are complications seen in breast feeding, further associated with cardiovascular disorders and mood disorders [14]. Women with PCOS show delay in menopause and menopause effect is more drastic when compared to normal women. There is a high chances of developing cardiovascular disease in PCOD women [13].
2.4 Treatment A group of 50 women participants with PCOD aging 20–35 years, divided themselves into two groups of 25 each and underwent treatment at a gynecological clinic. The first set of women were a part of yoga intervention group and after constant yoga therapy for 3 months they noticed decreased levels of serum testosterone about 45.96% and they also noticed decrease in anxiety about 17%. The second set of women belonged to allopathy intervention group and they underwent Allopathy treatment for 3 months which resulted in decreased levels of serum testosterone about 35.75% and they also noticed changes in the levels of anxiety about 9.24%. The above improved data suggest that yoga and Allopathy can be used as alternate treatment methods [21]. Studies show that PCOS is hard to detect in adolescents as the changes at puberty often overlaps with the PCOS symptoms. Even after PCOS is detected in an adolescent’s it’s hard to continue the timely treatment as the increase in insulin also raises the risk of diabetes (type-2). Although there are other therapeutic methods to treat PCOS, which includes lifestyle intervention, oral contraceptive pills and insulin sensitizers. But on a prolonged period of time follow ups are required based on the effectiveness of the above approaches [22]. Physicians generally suggest to intake nutritious food enriched with whole grains, fresh produce and plant-based proteins while following a balanced diet. Having three meals per day is an important aspect as skipping meals can lead to overeating and it may contribute to worsening PCOS. Types of food to be avoided includes sugary drinks, fried food, processed food, trans fat and refined carbohydrates [23]. However, physicians have no proper medication for PCOS, in some cases birth control pills can reduce the symptoms of PCOS. The types of birth control pills used are hormonal pills meaning these pills constitutes estrogen and progestin [15]. But consumption of these pills leads to side effects like difficulty in getting pregnant, bloating, swelling, leg cramps etc. Some pills such as Leostrin have lower estrogen levels, which can result in the reduction of severity of some side effects but may also be less effective on PCOS symptoms (Fig. 1).
102
M. J. Lakshmi et al.
PIE CHART INFERRING SEVERITY OF ACNE IN PCOS WOMEN 4, 4% 15, 15% NONE MILD
56, 56%
25, 25%
MODERATE SEVERE
Fig. 1 Acne severity
2.5 Methods Used In paper [2] proposed by Yin et al. have included 30,989 women out of which 9,265 women were with PCOS and around 25,638 were at the stages which could be controlled. The methodology used were Rotterdam and Meta analyses where the subgroup analysis consists of sensitivity test, Egger’s regression test (funnel plot) and meta regression. Inference from these methods, were 70% women suffering from anxiety and 88% experienced low quality of life. Lee and Dokras [17] conducted a study with a group of 554 PCOS women to find the number of women who suffered from depression and anxiety. With reference to Stener-Victorin et al. [12] focused on pyscological traits PCOS women during pregnancies using HOMA-IR, BMI, Meta analysis and concluded that Pregency women with PCOD have the higher risk of lifelong psychiatric disorders. Greenwood et al. [10] considered 732 women to inculcate the relationship between quality life and mental health of PCOS women with the help of Rotterdam, HRQOL Survey which found HRQOL disturdance in PCOS females, 64 women met with depression. In paper [23], Douglas et al. used methods like FSIGT, CT scan to determine eucaloric diet and insulin level in women which ındicated that moderate reduction in dietary carbohydrate reduces insulin concentration in PCOD. Devi and Rani [21] included 50 women. Yoga and allopathy were the tools used which resulted in reduced anxiety, stress and serum hormonal levels. The methods LSM, BMI and Ferriman-Gallwey. It concluded that 554 PCO women faced gynaecologic challenges and pyschological disturbance. With reference to Carreau et al. [7] who considered women between age group of 18–21. RDC analysis, BMI and Dixon technique were used for sceening NAFLD risk in obsese adolescents with PCOS which resulted in controlling NAFLD risk. Paper[6] written by Kerchner et al. focused on 43,000 US women to predict risk of depression in PCOS females. This was implemented by methods like PRIMR-MD PHQ, Rotterdam which concluded
Prediction of PCOS and PCOD in Women Using ML Algorithms
103
mood disorder in women with PO syndrome. Anitha et al. proposed a paper [3] which concentrated on 50 cases of women with PCOS using methods such as Rotterdam, Beck’s depression inventory that increased BMI and depression course in PCOS group. In paper [8] by Torres-Zegarra et al. considering dermatologic findings—93% acne, 38% hirsutism, 85% acanthosis nigricans, 16% hidradenitis suppurativa were concluded using methods like Ferriman-Gallwey score and BMI. Mallik mubasher Hassan and Tabasum Mirza in their research work compared various ML Algorithms for detecting PCOS. In the research, the author has used various classifiers like SVM, Logistic regression, Random Forest, CART, Naïve Bayes classification. The author has considered the dataset having seven parameters and obtained the accuracy of 96% in the diagnosis of PCOS through Random Forest algorithm. Subrato Bharati et al. in the research used computational algorithm like gradient boosting, RF algorithm, Logistic regression, RFLR for the detection of PCOS by considering 43 attributes and obtained the accuracy of 91.01% using RFLR for detection of PCOS. Zhang et al. in their research paper have worked on several machine learning algorithms like Random Forest, KNN, Extreme Gradient boosting by considering 100 women to follicular fluid and plasma In PCOS women and obtained accuracy of 89.32% through gradient boosting algorithm. Amsy Denny et al. has applied different ML algorithms like KNN, CART, SVM, random forest in their research work for early detection of PCOS using 23 attributes and acquired the accuracy of 89.02% using RFC method. Shamik Tiwari et al. in their work utilised ML algorithm called Random forest to determine the common occurrence of PCOS in young adolescents considering the dataset of screening parameter using RF method that resulted in the accuracy of 93.25%. Subrato Bharati et al. in their research work used a Machine Learning algorithm called GIST-MDR technique to determine the factors affecting PCOS in women considering the dataset of 177 women that resulted in the accuracy of 91.12%. Gopalakrishnan and Iyapparaja in the research work used several Machine Learning algorithms like Naïve bayes algorithm, random forest and SVM to categorize PCOS patients based on ultra sound images and obtained the accuracy of 93.8%. Gupta et al. in their research paper considered a dataset of women aged between 15 and 49 for differentiating methods for PCOS women using Machine Learning algorithms such as FPR, ROC, Curve, AUC, gradient boosting and adaptive boosting that presented hidden anomalies. Stener-Victorin et al. in their research have worked to find the impacts of of psychological traits in PCOS women during pregnancy with algorithms like HOMA-IR, BMI, meta-analysis of GWAS upon consideration of women from the age ranging from 20 to 40 years. The result obtained from the study was that PCOS patients have a extreme risk of lifelong psychiatric disorders. Cristiana Neto et al. worked on the prediction indicators for women with PCOS based on the Machine Learning algorithm like logistic regression, Gaussıan naïve bayes, random forest and 94% accuracy was obtained by logistic regression and Gaussıan naïve bayes. Meena et al. used computational methods like Fuzzy method, Naïve bayes, artificial neural network, decision tree, SVM. They have considered around four parameters for Classification of various stages in PCOS and proposed factors which reduced PCOS [24, 25]. Homay Danaei Meher and Huseyin Polat suggested various diagnosis method for PCOS by considering seven parameters and applied machine
104
M. J. Lakshmi et al.
learning algorithms like Ensemble RF, Extra tree, Adaptive Boosting and obtained highest accuracy of 98.89% with ERF method. Sivinani Agarwal and Kavitha Pandey [26, 27] enlisted various associated disease in PCOS patients and for inferring results they accounted eight parameters and machine learning algorithms were incorporated to obtain maximum accuracy. Akanksha Tawar et al. considered 41 clinical attributes for predicting PCOS in women using machine learning algorithm like Random forest method (Table 1) [28, 29].
3 Ensemble Learning Algorithms This section consists comparative study and a brief introduction on Ensemble Learning algorithms, including ML classifiers.
3.1 Random Forest Random Forests/random decision forests is a method of ensemble learning which uses the “bagging” method to construct multiple decision trees while training the data. For classification problems, the RF output is the label that ispredicted by the most decision trees. Figure 2 depicts the way of constructing random forests from N decision trees.
3.2 Bagging Classifier This is ensemble meta-estimator that applies base classifiers to each subset of original data at random and later aggregates (votes/averages) the single predictions to obtain the final prediction. This classifier can be used to reduce variance of a decision tree, which is a black-box estimator. It could be accomplished by incorporating random nature into the assemblymethod and then creating an ensemble from it. Figure 3 depicts the construction of the above classifier.
3.3 AdaBoosting AdaBoosting/Adaptive Boosting, is a 1996 continual ensemble boosting classifier. This classifier combines several models to improve the accuracy/performance. It integrates several low-performing models to produce a strong classifier with goodperformance. The scheme is to adjust classifier weights and to perform training on
Prediction of PCOS and PCOD in Women Using ML Algorithms
105
Table 1 Literature survey comparision Author
Dataset used
Objective
Malik mubasher Hassan and Tabasum Mirza [24]
7 parameters
Prediction of PCOS SVM, Random in women forest, CART, Naive Bayes classification, logistic regression
Method
Subrato Bharati et al. [25]
43 parameters Diagnosis of PCOD Gradient boosting, using ML technique random forest, logistic regression, RFLR
Obtained 91.01% accuracy using RFLR for detection of PCOS
Xinyi Zhang et al. [26]
100 women
Spectroscopy Follicular fluid and plasma with ML algorithms for PCOS
89.32% is the highest accuracy using extreme gradient boosting algorithm
Amsy Denny et al. [27]
23 attributes
Detection of PCOS KNN, CART, in early stages SVM, random forest
Using RFC method obtained accuracy is 89.02%
Shamik Tiwari et al. [28]
Screening parameters
Prevalence of PCOS in young adults
Random forest
Using RF method accuracy was 93.25%
Yin et al. [2]
30,989 women
To give systemic review of PCOD women about mental health
Rotterdam, meta-analysis
70% are suffering from anxiety, 88% are experiencing low quality of life
Lee and Dokras [30]
554 PCOS Women
Screening depressive and anxiety symptoms in PCOS patients
LSM, Ferriman-Gallwey score, BMI
The study inferred 554 PCO women faced Gynaecologic challenges and psychological disturbance
Women of Anne-Marie Carreau et al. [7] age 18–21
Screening NAFLD risk in obese adolescents with PCOS
ROC analysis, BMI, Dixon technique
Screening tools helps control NAFLD risk in PCOS patients
Kerchnaer et al. [7]
43,000 US women
To predict the risk for depression in PCOS women
PRIMR-MD PHQ, Significant risk of Rotterdam mood disfunctionality in women with PO syndrome
Subrato Bharati et al. [25]
177 women
Factors impacting PCOS in women
GIST-MDR technique
Random forest, k-nearest neighbour, extreme gradient boosting
Result 96% accuracy in diagnosis of PCOS through random forest algorithm
91.12% accuracy using GIST-MDR method (continued)
106
M. J. Lakshmi et al.
Table 1 (continued) Author
Dataset used
Objective
Method
Anitha et al. [3]
50 cases
Prevalence of depression among PCOS women
Rotterdam, Beck’s BMI and depression depression scores inventory were significantly increased in PCOS group
Gopalakrishnan and Iyapparaja [30]
6 parameters
Classification of PCOS patients based on ultra sound images
Naïve bayes algorithm, SVM, random forest
Homay Danaei Meher and Huseyin Polat [38]
7 parameters
Various diagnosis method of PCOS were suggested
Ensemble Random Using Ensemble Forest, Extra tree, Random Forest Adaptive Boosting algorithm 98.89% accuracy is obtained
Torres-Zegarra et al. [8]
92 patients
Multidisciplinary Ferriman-Gallwey approach is used to score, BMI provide comprehensive care to adolescent girls with PCOS
Dermatologic findings—93% acne, 38% hirsutism, 85% acanthosis nigricans, 16% hidradenitis suppurativa
Gupta et al. [7]
Women aged 15–49
Differentiating methods proposed for PCOS patients
FPR, ROC Curve, AUC, gradient boosting, adaptive boosting
Hidden anomalies were presented using machine algorithms
Stener-Victorin et al. [12]
20–40 year old women
Impact of HOMA-IR, BMI, psychological traits meta-analysis of in PCOS women GWAS during pregnancy
Study inferred that in PCOS women have increased risk of lifelong psychiatric disorders
Cristiana Neto et al. [25]
Healthcare dataset
Prediction indicators for PCOS
logistic regression, Gaussıan naïve bayes, random forest
94% accuracy obtained by logistic regression and Gaussıan naïve bayes
Classification of various stages in PCOS
Fuzzy method, naïve bayes, artificial nrural network, decision tree, SVM
Proposed factors which reduced PCOS
Meena et al. [37] 4 parameters
Result
93.82% accuracy was obtained by machine learning algorithms
(continued)
Prediction of PCOS and PCOD in Women Using ML Algorithms
107
Table 1 (continued) Author
Dataset used
Objective
Method
Result
Greenwood et al. 732 women [10]
Inculcate the Rotterdam, relationship HRQOL survey between quality life and mental health in PCOS women
HRQOL disturbance seen in PCOS women, 64 women met criteria for depression
Douglas et al. [23]
Age group of 19–42 years
To determine eucaloric diet and insulin levels in PCOD women
Moderate reduction in dietary carbohydrate reduces insulin concentration in PCOD women
Devi and Rani [21]
50 women
Treating PCOS and Yoga therapy, PCOD using allopathy allopathy and yoga therapy
Reduced anxiety, stress and serum hormonal levels
Shivani Agarwal 8 parameters and Kavitha Pandey [39]
Detection of some associated diseases in PCOS patients
Enlisted associated diseases with PCOS patients
Akanksha Tawar 41 clinical et al. [40] attributes
Prediction of PCOS Random forest
FSIGT, CT scan
Machine learning algortithm
Early prediction of PCOS is possible using machine learining algorithms
data points in each iteration such that precise predictions of unusual cases can be made. Here any ML classifier can be considered as base classifier. The Adaboosting classifier works this way: • • • • • •
Choosing a training subset at random. Iteratively selecting the training data points for the above ML algorithm. Giving more weight to cases that were incorrectly classified. Adding weight to the trained classifier in each iteration. Iterating until all of the training data fits perfectly. For classification, take vote from all algorithms. Figure 4 contains the working procedure of the above classifier.
108
Fig. 2 Construction of Random Forest
Fig. 3 Bagging classifier
M. J. Lakshmi et al.
Prediction of PCOS and PCOD in Women Using ML Algorithms
109
Fig. 4 AdaBoost classifier
3.4 Gradient Boosting It happens to be the most powerful model in the field of machine learning. In machine learning, there are two error types: bias errors; variance errors. It is one of those algorithms which is used to minimize model bias error. Figure 4 depicts the operation of the Gradient Boosting classifier (Fig. 5).
Fig. 5 Gradient Boosting classifier
110
M. J. Lakshmi et al.
4 Methodology The following are the steps employed in this paper: Step 1 Data Collection: The data referred as PCOS data without infertility dataset having 541 patients instances with 45 attributes related to the disease is collected from UCI ML depository. Step 2 Data Pre-processing: It is a method that entails changing raw data into a comprehensive form. In this step, missing values, outliers are treated. Step 2.1 Data imputation: Data imputation was required because the dataset had few missing/null values. Mean imputation was the method employed to overcome the problem of null values. Methods like median imputation could also be employed in case of missing values. Step 2.2 Data normalization: It was required to get the data to a proper scale. To achieve this, standard scaler was used. Step 2.3 Correlation analysis: Here the relationship between two features is found. If two features are highly correlated, the value is near to + 1; otherwise, the value is near to –1. If there is no relationship, the value is 0. Step 3 Feature selection and model building: Overfitting is a common problem which occurs when too many features are used for model building. Hence, feature selection becomes important. Once the features are selected using feature importance method, data is divided into training and test set in some ratio. Step 4 Model building: The model is fit on the training data and prediction is done on the test set. Step 5 Hyperparameter tuning: This was carried out using three methods—Bayesian optimization, Randomized Search CV, and Grid Search CV. Step 6 Model evaluation: Model assessment metrics such as accuracy, precision, and recall, as well as F1score, are used to evaluate the performance of various classifiers. Classification report is also generated.
Prediction of PCOS and PCOD in Women Using ML Algorithms
111
5 Results Ensemble classification algorithms are used for the prediction of presence of PCOD in the young adult women. Further the results are enhanced using hyper parameter tuning techniques such as Grid searchCV and the Randomized searchCV. From the Table 2, it is observed that the Random Forest classifier produced the accuracy of 75.32% accuracy and 75% F1 score. The parameters used for the Random Forest was: 10 estimators. Before applying the Grid searchCV and the Randomized searchCV, the manual hyper parameter tuning was performed with the parameters 300 estimators, with entropy as criteria, sqrt as max features, 10 as the minimum sample leaf and obtained the accuracy of 82.46% and F1 score as 82%. From the Table 3, it is observed that the Random Forest classifier parameters when optimized using Randomized searchCV obtained the accuracy of 84.41% and F1 score as 84%. The best parameters obtained by Randomized searchCV are: criteria as gini, number of trees (estimators) as 1,800, 1 as minimum sample leaf, auto as max feature and obtained the accuracy of 84.41% and F1 score as 84%.Using the best parameters of Randomized searchCV, the Grid searchCV technique is applied and obtained the accuracy of 81.81% and F1 score as 82%. Further, the parameters of Random Forest are tuned using Bayesian optimization and observed theaccuracy of 79.87% and F1 score as 80%. The graph below shows the improvement in the Random Forest classifiers using hyper parameter tuning techniques. From the graph below, it clearly shows the performance of the Random Forest classifier improved from the accuracy 75.32– 82.46% using manual hyper parameter tuning method, 84.41% using Randomized searchCV, 81.81% accuracy using Grid searchCV and 79.87% accuracy using Bayesian optimization (Fig. 6). From the Table 4, it is observed that Gradient boosting outperformed comparing to other ensemblers with the accuracy of 89.76% and F1 score of 89%. Table 2 Prediction of various computational models Sl. No
Ensemble models
Accuracy (%)
F1 score (%)
1
Random Forest classifier
75.32
75
2
Manual hyper parameter tuning
82.46
82
Table 3 Prediction of PCOS using Grid searchCV and the Randomized searchCV
Sl.No Ensemble models
Accuracy (%) F1 score (%)
1
Randomized searchCV 84.41
84
2
Grid searchCV
81.81
82
3
Bayesian optimization
79.87
80
M. J. Lakshmi et al.
Accuracy & F1 score
112
Performance of the Random Forest classifier with hyper parameter tuning techniques 90.00% 85.00% 80.00% 75.00% 70.00%
Accuracy
F1 Score
Fig. 6 Random forest classifier
Table 4 Results of prediction using Boosting Sl. no
Ensembler classifier
Accuracy (%)
F1 score (%)
1
Adaboost
87.87
88
2
Grandient boosting
89.76
89
3
Xgboost
83.1
83
4
Gradient boosting with L2 regularization with learning rate < 1.0
91.7
92
Among all the ensemblers, grandient boosting performed better because the decision trees in gradient boosting are built sequentially in the additiveform andthe overfitting problem is avoided with proper L2 regularizationtechniquesand obtained the accuracy of 91.7% and F1 score of 92% with the learning rate