1,231 103 73MB
English Pages 910 [911] Year 2023
IoT, Cloud and Data Science Selected peer-reviewed full text papers from the International Research Conference on IoT, Cloud and Data Science (IRCICD'22)
Edited by Dr. S. Prasanna Devi Dr. G. Paavai Anand Dr. M. Durgadevi Dr. Golda Dilip Dr. S. Kannadhasan
IoT, Cloud and Data Science Selected peer-reviewed full text papers from the International Research Conference on IoT, Cloud and Data Science (IRCICD'22)
Selected peer-reviewed full text papers from the International Research Conference on IoT, Cloud and Data Science (IRCICD'22), May 06-07, 2022, Chennai, India
Edited by
Dr. S. Prasanna Devi, Dr. G. Paavai Anand, Dr. M. Durgadevi, Dr. Golda Dilip and Dr. S. Kannadhasan
Copyright 2023 Trans Tech Publications Ltd, Switzerland All rights reserved. No part of the contents of this publication may be reproduced or transmitted in any form or by any means without the written permission of the publisher. Trans Tech Publications Ltd Seestrasse 24c CH-8806 Baech Switzerland https://www.scientific.net
Volume 124 of Advances in Science and Technology ISSN print 1662-8969 ISSN web 1662-0356
Full text available online at https://www.scientific.net
Distributed worldwide by Trans Tech Publications Ltd Seestrasse 24c CH-8806 Baech Switzerland Phone: +41 (44) 922 10 22 e-mail: [email protected]
Preface One of the most significant characteristics of the evolving digital age is the convergence of technologies that includes sensors (Internet of Things: IoT), data storage (cloud), information management (databases), data collection (big data), data applications (analytics), knowledge discovery (data science), algorithms (machine learning), transparency (open data) and API services (micro services, containerization). The International Research Conference on IoT, Cloud and Data Science - IRCICD'22 aimed to bringing together researchers from the industry, academicians and business delegates, scholars and graduate students to exchange and share their experiences, new ideas, and research results across aspects of IoT, Cloud and Data Sciences. The conference focused on promoting recent advances and innovation in the field of IoT, Cloud and Data Science. First and foremost, we would like to express our gratitude to the authors, whose excellent work is the core of the proceedings, and gratefully congratulate all those involved and wish them great success. We would like to take this time to thank our family and friends for their support and encouragement while we were working on this edition. We extend our appreciation and respect to our almighty Lord for his bountiful grace, which enabled us to finish this edition successfully. We also thank the Scientific.Net publisher and their team for facilitating the publishing process and providing the opportunity to be the part of this work.
Table of Contents Preface
Chapter 1: Machine Learning to Image Processing and Computer Vision Hand Gesture Recognition Used for Functioning System Using OpenCV S. Patel and R. Deepa Sign Language Detection Application Using CNN G.M. Rao, B.A. Reddy and A. Jayashankar SIGN BOT Extending an Ability to Communicate by Creating an Indian Sign Language S.S. Kumar, K.V. Ajay, N.S. Arun, B. Devasarathy and B. Hariharan Replicare: Real-Time Human Arms Movement Replication by a Humanoid Torso V. Jagannath, S. Kumar and P. Visalakshi Brain Tumor Detection Using Deep Learning S.J.A. Jairam, D. Lokeshwar, B. Divya and P.M. Fathimal Music Recommendation System Using Facial Emotions P. Sharath, G. Senthil Kumar and B.K.S. Vishnu Face Mask Detection Using OpenCV A. Najeeb, A. Sachan, A. Tomer and A. Prakash Crack Detection of Pharmaceutical Vials Using Agglomerative Clustering Technique C.R. Vishwanatha and V. Asha Obstacle Detection and Text Recognition for Assisting Visually Impaired People M. Aishwarya, S. Shivani, K. Deepthi and G. Dilip Recognition of American Sign Language with Study of Facial Expression for Emotion Analysis A. Chakraborty, R.S. Sri Dharshini, K. Shruthi and R. Logeshwari Recognition of Plant Leaf Diseases Using CNN V. Gurumurthi and S. Manohar Face Mask and Social Distancing Detection Using Faster Regional Convolutional Neural Network B.M. Tushar, M.S. Rahman, A.S. Jaffer and M. Indumathy Sign Language Detection Using Action Recognition N. Dutta and M. Indumathy Brain Tumor Segmentation Using Modified Double U-Net Architecture T. Shaji, K. Ravi, E. Vignesh and A. Sinduja Surveillance Image Super Resolution Using SR - Generative Adversarial Network N.V. Narayanan, T. Arjun and R. Logeshwari Classification of Covid-19 X-Ray Images Using Fuzzy Gabor Filter and DCNN S. Sandhiyaa, J. Shabana, K. Ravi Shankar and C. Jothikumar Using Transfer Learning for Automatic Detection of Covid-19 from Chest X-Ray Images H.M. Shyni and E. Chitra Detect Anomalies on Metal Surface K. Arnav and H. Ravindran Indian License Plate and Vehicle Type Recognition D.V.S. Sambhavi, S. Koushik and R. Fathima Proposed Music Mapping Algorithm Based on Human Emotions H.K. Burnwal, M. Mishra and K. Annapurani Detection of Bike Riders with no Helmet G. Sai Aditya, B. Swetha, K. Meenakshi and C. Hariharan Classification of Retinal Images Using Self-Created Penta-Convolutional Neural Network R.S. Narain, R. Siddhant, V.S. Barath, P.S. Anubhaps and N. Muthurasu Weather Image Classification Using Convolution Neural Network S. Nambiar, P. Arjun, D.R. Venkateswar and M. Rajavel
3 11 20 28 37 44 53 60 72 80 88 96 103 111 125 137 147 156 162 170 180 187 197
b
IoT, Cloud and Data Science
Facial Landmark and Mask Detection System N. Bhattacharjee, S. Dubey and S. Siddhartha Candidate Prioritization Using Automated Behavioural Interview with Deep Learning H.V. Panjwani and A.V.Y. Phamila Book Recommendation Based on Emotion in the Online Library Using CNN R. Srinath, S.S. Ravindran Gunnapudi and L.N. Kokkalla Armament Detection Using Deep Learning R. George Jose and R. Rajmohan Tomato Leaf Disease Detection Using Deep Convolution Neural Network M. Arafath, A.A. Nithya and S. Gijwani Self-Driving Car Using Supervised Learning R. Collins, H. Kumar Maurya and R.S.R. Ragul Deep Learning Approach to New Age Cinematic Video Editing D. Dutta, S. Jawade and S.M. Hussain YouTube Music Recommendation System Based on Face Expression K.Y. Rathod and T. Pattanshetti Driver Yawn Prediction Using Convolutional Neural Network J.S.R. Melvin, B. Rokesh, S. Dheepajyothieshwar and K. Akila Diagnosis of Alzheimer’s Disease Using CNN on MRI Data P. Agarwal, V. Jagawat, B. Jathiswar and M. Poonkodi Predicting and Classifying Diabetic Retinopathy (DR) Using 5-Class Label Based on PreTrained Deep Learning Models S. Karthika and M. Durgadevi Early Wheat Leaf Disease Detection Using CNN R. Vedika, M.M. Lakshmi, R. Sakthia and K. Meenakshi
203 211 219 228 236 246 250 259 268 277 285 295
Chapter 2: Computational Linguistics Sentiment Analysis on Food-Reviews Dataset R. Deepa, F. Khan, H. Singh and G.J.S.S. Gupta An Extractive Question Answering System for the Tamil Language A. Krishnan, S.R. Sriram, B.V.R. Ganesan and S. Sridhar Deep Learning-Based Speech Emotion Recognition A. Sinha and G. Suseela Social Media User Oppression Detection Technique Using Supervised and Unsupervised Machine Learning Algorithms P.T. Shreya and P. Durgadevi Medical Diagnosis through Chatbots P. Iyer, A. Sarkar, K. Prakash and P.M. Fathimal Sentiment Analysis of National Eligibility-Cum Entrance Test on Twitter Data Using Machine Learning Techniques E. Chandralekha, V.M. Jemin, P. Rama and K. Prabakaran Two-Step Text Recognition and Summarization of Scanned Documents V. Varun and S. Muthukumar Detecting Fake Job Posting Using ML Classifications and Ensemble Model A.K. Praveen, R. Harsita, R.D. Murali and S. Niveditha Artificial Intelligence Based Chatbot for Healthcare Applications K.A. Nimal, V.V. Nair, R. Jegdeep and J.A. Nehru An Applied Computational Linguistics Approach to Clustering Scientific Research Papers A. Vora, M. Mishra and S. Muthukumar Speech Accent Recognition C. Shantoash, M. Vishal, S. Shruthi and G.N. Bharathi Movie Recommendation System Using Machine Learning P. Arokiaraj, D.K. Sandeep, J. Vishnu and N. Muthurasu
305 312 320 330 335 344 355 362 370 378 392 398
Advances in Science and Technology Vol. 124
c
Chapter 3: Machine Learning to Financial Data Analysis Data Analysis and Price Prediction of Stock Market Using Machine Learning Regression Algorithms A.L. Gavin, P.K.V. Prasanna, C.V. Vedha and A. Sinduja Financial Time Series Analysis and Forecasting with Statistical Inference and Machine Learning S. Vishnu and M. Uma Stock Market Portfolio Prediction Using Machine Learning S. Venkatesan, S. Sanjay Kumar and G.S. Kutty Stock Market Ontology-Based Knowledge Management for Forecasting Stock Trading M.U. Devi, P. Akilandeswari and M. Eliazer
409 418 426 433
Chapter 4: Machine Learning on Other Types of Datasets Train Track Crack Classification Using CNN M. Kumar and P. Visalakshi Crime Prediction Using Machine Learning Algorithms M.S.S. Ganesh, M.B.R.P. Sujith, K.V. Aravindh and P. Durgadevi Prediction of Eligibility for Covid-19 Vaccine Using SMLT Technique P. Bisht, V. Bora, S. Poornima and M. Pushpalatha A Novel Analysis of Employee Attrition Rate by Maneuvering Machine Learning V. Lalitha, R.S. Prithiv and P. Lokesh Building a Recommender System Using Collaborative Filtering Algorithms and Analyzing its Performance A. Jeejoe, V. Harishiv, P. Venkatesh and S.K.B. Sangeetha Development of LSTM Model for Fall Prediction Using IMU V. Chandramouleesvar, M.E. Swetha and P. Visalakshi CNN-Based Covid-19 Severity Detection and it’s Diagnosis M.A.A. Khan, G. Senthil Kumar and R.R. Varughese Classification of Imbalanced Datasets Using Various Techniques along with Variants of SMOTE Oversampling and ANN M. Shrinidhi, T.K. Kaushik Jegannathan and R. Jeya Used Car Price Prediction Using Machine Learning A.S.J. Alexstan, K.M. Monesh, M. Poonkodi and V. Raj Pest Classification with Deep Learning and ReactJS V.D. Jaya and N. Poornima A Prediction Model - Comparative Analysis and Effective Visualization for COVID-19 Dataset S. Vishal, M. Uma and S.M. Florence Music Recommendation System Using Machine Learning V. Rajput, H. Rajput and P. Padmanabhan Crime Analysis and Prediction Based on Machine Learning Algorithm A. Dhakksinesh, O.R. Katherine and V.S. Pooja Electric Vehicle’s Charging Stations Allocation System for Metropolitan Cities N. Das, S. Tiwari and T.Y.J.N. Malleswari Farm Track Application Development Using Web Mining and Web Scraping K. Karthigeiyan and M.D. Devi Quality Check of Water for Human Consumption Using Machine Learning J.M. Nitharshni, R. Nilasruthy, K.R. Shakthi Akshaiya and M. Rajavel Clinical Outcome Future Prediction with Decision Tree and Naive Bayes Models S. Veena, D. Sumanth Reddy, C. Lakshmi Kara and K.A. Uday Kiran Software Bugs Detection Using Supervised Machine Learning Techniques M. Shah, A. Sharma and R. Kumar Rock, Paper and Scissor Using AI- Random Forest Algorithm A. Murukesh and R. Logeshwari
447 457 462 469 478 486 496 504 512 518 525 536 549 556 566 574 590 594 602
d
IoT, Cloud and Data Science
A Novel K Means Biclustering Fusion Based Collaborative Recommender System S. Saravanan, A. Britto and S.M. Prabin
607
Chapter 5: Blockchain Technology Decentralized E-Voting System Using Ethereum Blockchain Technology S.A. Ahad, S. Sangra, J. Saini and R. Deepa Preserving Data Integrity in Legal Documents Using On-Chain NFTs A. Karnatak and G. Suseela A Step towards Blockchain Scalability Resolution A. Goel and G. Suseela Secure Online Voting System Using Blockchain M. Thirumaran, S. Tiroumalmouroughane and M. Sathish Kanna Converging Blockchain for Organ Donation M.S. Kavitha, D. Priyadharshini, K. Priyanka, K.P. Rathina and M. Rithika Voter Verification in an Election Using Merkle Tree B. John, A.K. Singh and C. Sabarinathan
619 628 635 644 654 663
Chapter 6: Cloud Computing Optimizing Information Leakage in Multi Cloud Storage Services R. Baalagi, H. Sindhura, M. Keerthi and G. Dilip Security in Cloud Technologies: A Brief Overview S.V. Rayapati, S. Muttavarapu, N. Nagasuri and S. Singhal Shopper’s Cart D. Thiagarajan, R.T.S. Arsath, A.R. Arun and E. Dinesh
677 683 696
Chapter 7: WEB and Network Security Enhanced Malware Detection Using Deep Learning with Image Processing Techniques D.A. Benny King, P. Prabhath and P. Durgadevi Phishing Website Detection Using Natural Language Processing and Deep Learning Algorithm M. Thirumaran, R.P. Karthikeyan and V. Rathaamani Secret Sharing Scheme with Fingerprint Authentication A.P. Sarangaraja, D. Rajagopalan, A. Vignesh and P.M. Fathimal Network Intrusion Detection System Based Security System for Cloud Services Using Novel Recurrent Neural Network - Autoencoder (NRNN-AE) and Genetic S. Priya and R.S. Ponmagal Comprehensive Survey on Detecting Security Attacks of IoT Intrusion Detection Systems M. Ramesh Kumar and P. Sudhakaran A Cumulative Tool for Docker Vulnerability Scanner C.J. Ram, V. Pavithran and V. Nair A Flexible and Investigation Approach for Encrypted Features Space Using Neural Network T. Archana, J.F. Banu, S. Prasad and P.R. Shrivastava Regularized CNN Model for Image Forgery Detection A. Kumar, N. Tiwari and M. Chawla Implementation of Intrusion Detection Model for Detecting Cyberattacks Using Support Vector Machine S. Ashwini, M. Sinha and C. Sabarinathan Sensitive Data Transaction Using RDS in AWS S. Manohar, M. Vignesh and G.M. Prabhu Building a New IPv8 Bootstrapper and Network Discovery Strategy for Trusted Chain Identities A. Shagun, T.V. Hemalikaa, R. Nusair Basha and C.A.S. Deivapreetha
703 712 719 729 738 748 754 762 772 782 789
Advances in Science and Technology Vol. 124
e
Chapter 8: Internet of Things and Networking An Intensive Investigation of Vehicular Adhoc Network Simulators S. Bharathi and P. Durgadevi Fog Offloading and Scheduling in Traffic Monitoring System by Deep Reinforcement R. Archana and K.K. Pradeep Mohan IOT Based Smart Air Purifier Y.K. Pandey, P.K. Patel, M.S. Yadav, S. Singh and P. Yadav Placing Controllers Using Latency Metrics in a Smart Grid Implementing SoftwareDefined Networking Architecture R. Jeya, G.R. Venkatakrishnan and V. Nagarajan An Intelligent Framework for Energy Efficient Health Monitoring System Using Edge-Body Area Networks R. Abirami and E. Poovammal Transformer: Health Monitoring System Based on IoT N. Mishra, A. Yadav, A. Yadav, P. Yadav and P. Yadav Real-Time Car Sharing and Car Rental Portal A.K. Singh, V. Naren, R.S. Thillak and G.P. Anand A Study of Mobile Edge Computing for IOT P. Rahul and A.J. Singh Human Lives Matter: Intelligent Transport System for Protecting Human Lives M.S.R. Satya and G.N. Bharathi Marking and Traceback Algorithm for IP Traceback with Bitwise Operator S.S. Kumar, M. Prithiv, S.R. Kumar, K.S. Kumar and K. Vishnu Introspection of Availability in Service Based Smart Systems Using Internet of Things (IoT) H. Ramalingam and V.P. Venkatesan Modelling of Boost Converter for PV Cell Using Matrix Topology P. Upadhyay, R. Yadav, S. Gupta, D. Yadav and S.K. Pal
797 809 817 828 836 844 851 856 864 873 881 894
CHAPTER 1: Machine Learning to Image Processing and Computer Vision
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 3-10 doi:10.4028/p-4589o3 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-22 Accepted: 2022-09-23 Online: 2023-02-27
Hand Gesture Recognition Used for Functioning System Using OpenCV Shubh Patel1,a and R. Deepa2,b Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, Tamil Nadu, India,
1
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, Tamil Nadu, India,
2
[email protected], [email protected]
a
Keywords: Hand gesture recognition, hand skeleton recognition, virtual calculator, virtual keyboard, virtual volume control
Abstract: Recently much attention has been paid to the design of intelligent and natural usercomputer interfaces. Hand Gesture Recognition systems has been developed continuously as its ability to interact with the machines. Now-a-days the news of metaverse ecosystem has increased the number of system in gesture recognition. Gestures are used to communicate with the PCs in a virtual environment. In this project Hand gestures are used to communicate information nonverbally which are free of expression to do a particular task. Here the hand gestures are recognized by using hand skeleton recognition using mediapipe library in python. In this project, the PC camera will record live video and recognizes the hand gestures based on which a particular functionality will take place. This project will present virtual keyboard, calculator and control system’s volume using hand gestures recognition technique by coding in Python using the OpenCV library. 1 Introduction Human-computer interaction, also known as human-machine interaction, is the relationship between a human and a computer or machine. Machines are meaningless and meaningless unless properly used by humans. The usefulness of a system is the degree and level at which the system is proficient in performing a particular task, and the functionality of a function is the set of services or functions that the system provides to its owner. A system that strikes a balance between these concepts is called a strong and influential enforcement system. Hand gestures are one element of frame language that may be communicated through the middle of the palm, the placement of the hands, the placement of the fingers, and the form the hand creates. At the same time as keyboards, joysticks, mouse or contact screens are a few input gadgets for communicating with a system, they do no longer offer the best interface whereas, the cutting-edge system will contain either desktop or computer interface wherein hand gesture may be accomplished by means of net camera used for snapping hand photo. “Hand gestures may be divided into static and dynamic.” Static recognition is to recognize a hand by extracting information from the expression of the hand, and dynamic gesture recognition is to recognize the trajectory of a hand in space and operate on the acquired trajectory parameters. Static gesture recognition has the advantage of low computational complexity, while dynamic gesture recognition is very complex but well suited for real-time environments. In this system dynamic hand gestures are recognized using Hand skeleton recognition which is a 3D hand gesture recognition approach. We use the hand's geometric shape to obtain an effective descriptor from the hand skeleton connecting joints returned by the system’s webcam.
4
IoT, Cloud and Data Science
2 Problem Statement With every new technology that comes the changes in the existing systems are assured to come with them. One side a virtual world is getting developed then there has to be ways to interact with them and that doesn’t include physical interaction devices like mouse, keyboard, joy stick, etc. because user will be wearing a VR headset so they can’t use the devices as the way they use before. • There are many ways of recognizing the gesture but not all of them are feasible, easy-to use and are not harmful for the user’s body. • Before systems were just to do a single function like controlling YouTube video [1], virtual calculator [9], virtual keyboard, etc. but were not done with multiple functionalities. •
3 Objective The goal is to provide the users with a way to operate their system in a dynamic, efficient and appealing manner. • Minimize the use of physical interaction devices required to be the interface between the user and the computer for few functions. • Basic functioning of the devices using hand gesture like calculating, controlling volume, typing. • This system is user friendly in a way that the goal of the project is achieved by only using 3 main gestures each having 1 gesture to trigger an event. • This system tracks for hand movement and gestures of the user’s hand and if any gesture is found then a function is carried out at that time. • Make it user friendly by making movement speed, frame rate, invert mouse, smoothness comfortable for the user. •
4 Literature Survey “Chinnam Datta Sai Nikhil, Chukka Uma Someswara Rao, E.Brumancia, K.Indira, T.Anandhi, P.Ajitha” [1] proposed how fingers can be recognized in a system and also shows how YouTube can be operated using gesture recognition. The functions were pause, resuming, volume-up, forward the video. [2] shows different methods on how gestures can be recognized, what is each system’s accuracy, which algorithm is used, application area, type of camera used, dataset used, etc. “Skeleton-based Dynamic hand gesture recognition by Quentin De Smedt, Haze Wannous, JeanPhilippe Vandeborre Tel´ ecom Lille this work suggests the advantages of using 3D hand skeleton data to describe hand gestures and represents a promising direction for performing gesture recognition tasks using skeleton-like information. They present an approach to recognize dynamic hand gesture as time series of 3D hand skeleton returned by the Intel RealSense depth camera. Experimental results, conducted on enrolled dynamic hand gesture dataset, show the performance of our proposed method. Moreover, their approach achieves a performance accuracy of 83% on a challenging dataset, which is encouraging” [10]. Below Fig. 1 shows different papers published by authors on how gestures can be recognized, what is each system’s accuracy, which algorithm is used, application area, type of camera used, dataset used, etc. [12]
Advances in Science and Technology Vol. 124
5
Fig 1. A review of techniques 5 Proposed System Camera tracks of the hand movement as well as the fingers and different combination of fingers signifies different functionality to be executed. Out of all the different recognition technique, this system will use hand skeleton recognition to recognize the hand that will be received through a webcam. The hand skeleton joints’ Fig 2 “specifies model parameters which can improve the detection of complex features” [12]. The orientation of the joints, the distances between them, the position and angles between the skeletal joints, and the trajectories and curvatures of the joints are the most widely used properties. The hand skeleton joints’ are marked by using mediapipe library while capturing video using opencv library in python. Then following the hand movements and capturing gestures. The gestures are mainly related to the marking [4, 8, 12, 20] in Fig 2 for this system. And after successfully recognizing the gesture the corresponding action takes place.
Fig 2. Hand skeleton markings 6 Modules The system is divided into two parts: “front-end and back-end”. The front-end frame is made up of one module, the UI module, whereas the back-end frame is made up of three modules: camera, discovery, and interface. The following is a summary of what they are:
6
IoT, Cloud and Data Science
Fig 3. System Architecture Diagram 6.1 UI module This module is in charge of selecting the type of functionality the user wants to perform. The user has to select out of the 3 options: virtual calculator, virtual keyboard and volume control. Once any of the option selected a pop-up window will open up recording live video of the user from the webcam and will show the some visuals based on the option selected.
Fig 4. UI 6.2 Camera Module This module accepts photographic input with a variety of tags and delivers the image to the acquisition module to be processed as frames. Cameras, gloves and data gloves are the most prevalent ways for capturing and identifying railings. In this system we will use the built-in web camera of our system to detect static and dynamic signals. 6.3 Discovery Module This module is in charge of image processing. This module takes output from the camera module and draws the hand skeleton on the hand if detected using the mediapipe library. In this stage the systems locates the hand, check for what fingers are up or what is the distance between the fingers using the markings as shown in the diagram. Once the required gesture is found, the next module works on it while still capturing for new or more gestures. 6.4 Interface Module “This module is responsible for mapping the detected hand gestures to their associated actions” [11]. Following that, these actions are routed to the proper system. Once the gesture is matched with the condition to carry out a function, the function is carried out and is sent as output on the screen. This module include 3 other modules: calculator module, keyboard module and volume module. Calculator Module If this option was selected then the calculator will be shown on the pop-up window with a buttons from 1 to 9 and arithmetic operators like addition, subtraction, division, and multiplication, equal to button for result and expression bar. User can use gesture to write the arithmetic equation and it will solve with logical mathematical rules and finally display the answer on the expression bar. After done with it press “c” on keyboard to clear expression or “q” on keyboard to quit the program.
Advances in Science and Technology Vol. 124
Fig 5. Calculator
7
Fig 6. Writing the expression
Fig 7. Executing the equation Keyboard Module If this option was selected then a keyboard will be shown on the pop-up window with alphabets from A-Z in the order of a general keyboard and some special characters and text bar which will output the letter selected. User can use gesture to write from the set of keys given and will be shown on the text bar after each entry. After done with it press “c” on keyboard to clear letters that were typed or “q” on keyboard to quit the program.
8
IoT, Cloud and Data Science
Fig 8. Keyboard
Fig 9. Typing Volume Module If this option was selected then a volume bar with current volume of the system will be shown on the pop-up window with. User can use gesture to increase or decrease the volume and set the volume when ready to, this will change the current volume of the system as well as update it on the pop-up window. After setting the volume with “q” on keyboard to quit the program.
Advances in Science and Technology Vol. 124
Fig 10. Increasing or decreasing the volume
9
Fig 11. Setting volume
7 Result The experimental findings for various functions such as typing, calculating, and volume control events. The higher the rate of recognition, the more probable it is that the gesture will be acknowledged. The higher the rate of recognition, the more probable it is that the gesture will be acknowledged. For each function, the distance between the user and the camera was set. For calculator it was maximum of 50 cm, for keyboard it was maximum of 30 cm and for volume it was minimum of 25 cm and maximum of 35 cm for the experiment. In the virtual calculator and volume control function has the very good movement and the system runs with no lag in regards of moving the cursor and selecting the number as well as the operators or setting volume but in keyboard there is a bit lag while selecting any character, the cursor moves good but selecting the character may fail sometime or there may be a delay in selecting or even sometimes selects the character twice. 8 Conclusion & Future Works To solve a shortcoming in interaction approaches, hand gesture recognition is used. Handcontrolling things is more natural, easier, more adaptable, and less expensive, and there's no need to deal with issues caused by hardware devices because they're not needed. The results of this system showed that the system is capable of interacting with the computer device without using any physical interacting devices like keyboard and mouse. We'd like to increase the precision even more in the future, as well as add more gestures to perform more functions that can fully operate a system using hand gestures. Even add more keys from the keyboard like spacebar, control, shift, etc. and even more operations in the calculator. Better techniques for implementing mouse events will be developed in the future, as well as a reduction in lag during cursor movement to almost zero. Additional functions, such as zooming in and out, shutting down, and so on, can be added. References [1] Finger Recognition and Gesture based Virtual Keyboard, Chinnam Datta Sai Nikhil, Chukka Uma Someswara Rao, E.Brumancia, K.Indira, T.Anandhi, P.Ajitha, 2020 [2] Hand Gesture Recognition using OpenCV and Python Surya Narayan Sharma, Dr. A Rengarajan, 2021 [3] Xi, C.; Chen, J.; Zhao, C.; Pei, Q.; Liu, L. Real-time Hand Tracking Using Kinect. In Proceedings of the 2nd International Conference on Digital Signal Processing, Tokyo, Japan, 25–27 February 2018
10
IoT, Cloud and Data Science
[4] Devineau, G.; Moutarde, F.; Xi, W.; Yang, J. Deep learning for hand gesture recognition on skeletal data. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018 [5] Jiang, F.; Wu, S.; Yang, G.; Zhao, D.; Kung, S.-Y. Independent hand gesture recognition with Kinect. Signal Image Video Process. 2014 [6] Konstantinidis, D.; Dimitropoulos, K.; Daras, P. Sign language recognition based on hand and body skeletal data. In Proceedings of the 2018-3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), Helsinki, Finland, 3–5 June 2018 [7] De Smedt, Q.;Wannous, H.; Vandeborre, J.-P.; Guerry, J.; Saux, B.L.; Filliat, D. 3D hand gesture recognition using a depth and skeletal dataset: SHREC’17 track. In Proceedings of the Workshop on 3D Object Retrieval, Lyon, France, 23–24 April 2017 [8] Chen, Y.; Luo, B.; Chen, Y.-L.; Liang, G.; Wu, X. A real-time dynamic hand gesture recognition system using kinect sensor. In Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), Zhuhai, China, 6–9 December 2015 [9] Hand Gesture Recognition Based Calculator, Sandeep Kumar, Mohit Tanwar, Anand Kumar, Gita Rani, Prashant Chamoli, 2019 [10] Skeleton-based Dynamic hand gesture recognition, Quentin De Smedt, Haze Wannous, Jean-Philippe Vandeborre Tel´ ecom Lille, Univ. Lille, CNRS, UMR 9189 - CRIStAL, F-59000 Lille, France [11] Hand Gesture Recognition for Human Computer Interaction, Aashni Hariaa, Archanasri Subramaniana, Nivedhitha Asokkumara, Shristi Poddara ,Jyothi S Nayak, ICACC-2017, 22-24 August 2017, Cochin, India [12] Hand Gesture Recognition Based on Computer Vision: A Review of Techniques, Munir Oudah, Ali Al-Naji, Javaan Chahl, 2020
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 11-19 doi:10.4028/p-332sp4 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-28 Accepted: 2022-09-16 Online: 2023-02-27
Sign Language Detection Application Using CNN Garimella Mohan Rao1,a*, B. Aseesh Reddy2,b and Akaash Jayashankar3,c Computer Science Engineering, SRM Institute of Science and Technology, Chennai, India
1,2,3
[email protected], [email protected], [email protected]
a
Keywords: American Sign Language, Sign to Text, Mobile Application, Sign Language Recognition system.
Abstract. Deaf or mute persons frequently use sign language to communicate, but it takes a lot of practice to learn. Principal mode of communication of the Hard-of-Hearing and Deaf community is sign language. Autism, Apraxia of speech, Cerebral Palsy, and Down syndrome are just a few of the disorders that may benefit from sign language. We're utilizing ASL (American Sign Language) for this project. Although ASL uses the same alphabet as English, it is not a dialect of English. American Sign Language is a separate language with its own linguistic framework. Signs are not expressed in the same order as words are in English. This is due to sign language's distinct grammar and visual aspect. In the United States, around half a million people use ASL. We will develop and implement a mobile application that will serve as a translation system to enable people communicate more efficiently using signs in this project. We will demonstrate a real-time sign language recognition system that uses existing datasets to transform a video of a user's signs into text. Introduction Having a discussion between a normal person and a deaf or dumb person is still one of the most difficult tasks. Written communication is inconvenient in face-to-face conversations, and this type of communication is impersonal and slow. In their own community, the deaf person can use Sign language, but not in front of others. In sign language, hand movements, face expressions, and body language are utilized to communicate. Deaf people and people who can hear but can't speak are the ones that use it the most. Certain hearing people, usually deaf person's family and relatives, as well as interpreters who help the deaf and wider groups communicate, use it as well. Only a small percentage of the population is aware of sign language. There are two sorts of strategies for solving sign language recognition problems: Sensor based and Vision based methods. Signers often wear a specific glove or sensor that shows information about their hand's orientation, location, rotation, and movements. Vision based approaches, on the other hand, make use of images captured by a camera without the need of sensors. To read and portray signs using color images, existing vision-based solutions employ a variety of image processing and machine learning techniques. Fingerspelling, sentence and word are 3 forms of sign language recognition systems. For new users, alphabetic sign language recognition system are a critical component in learning sign language. It assists signers in creating signs of cities, names, and other words for which no established signs exist. Color photographs are always used in these systems to capture the shape and texture characteristics of hand gestures. These systems always use color photos to capture the texture and contour information of hand gestures. Depth-based techniques, on the other hand, avoid these problems by relying on the hand's distance from the camera while deleting unnecessary texture information. We'll use Android Studio to construct an app that recognizes signs in real time and converts them to text using machine learning algorithms in the proposed project. It's for people who don't know how to sign yet wish to communicate with the Deaf and Hard-of-Hearing community.
12
IoT, Cloud and Data Science
Related Works Many implementations, such as gesture and sign language identification, have sprung up as a result of the recent development of multiple sensor types, particularly those that rely on comprehensive input. Sentence recognition, independent word recognition, and alphabet recognition are the three aspects of the sign language recognition challenge. This research only looks at the alphabets of American Sign Language (ASL). The five phases of most established ASL alphabet recognition systems are preprocessing, training, hand segmentation, feature extraction, and labelling. W.ALY, S.ALY AND SULTAN.A proposed this paper on Sign language recognition using PCANet and depth image. For Hand region they applied a simple preprocessing algorithm over depth image. Instead of using traditional hand-crafted feature extraction approaches, feature learning using convolutional neural network architectures is used. A simple unsupervised Principal Component Analysis Network (PCANet) deep learning architecture is used to successfully learn local characteristics derived from the segmented hand. Two learning methodologies for the PCANet model are suggested: PCANet model training from all user inputs or training a separate PCANet model for each user. To detect the retrieved features a linear Support Vector Machine (SVM) classifier is used. Suggested method's action is assessed by a publicly available dataset of real-time depth images recorded by a variety of users. When compared to color images, features extracted from depth photos can manage cross-user changes and image condition variations, resulting in promising outcomes. It shows that Experimental results show that using single PCANet model is better than using multiple user-specific PCANet models. In this paper the proposed system is tested using a public benchmark dataset collected from five different users and give average accuracy of 88.7% using leave-one-out evaluation strategy. Many problems affect existing color-based sign language recognition systems, including complicated backgrounds, hand segmentation, and high inter-class and intra-class variances. FIRST black & white and then to grey to make it easy to detect. Hand segregation is very hard for that they used CNN model to train. They used a very small dataset consisting the pictures from five different users. M.AL-HAMMADI proposed a sign language recognition using deep learning algorithms. Several studies on hand gesture recognition have been undertaken during the last three decades. The majority of works have used one of two approaches: vision-based or non-vision based. Gloves and sensors are used to collect Hand sign data. The hardware configuration for this approach, on the other hand, is expensive and unpleasant because it limits the signer's motion. In vision based, addresses their drawbacks by gathering information using sensors and cameras. Research that employ a visionbased method can be divided into two types: traditional techniques and deep learning-based techniques. The highest accuracy noted by this system was 87.69% using MLP fusion technique. Proposed work The proposed sign language recognition method comprises five different stages, preprocessing, training, hand segmentation, feature extraction, and labelling. In data preprocessing, the images are saved in separate folders named from A to Z. And also each folder is equally divided into 1000 pictures to train the dataset and also the image of each class is taken only if the image is clear to view for the training. In training, the sign dataset of all images is converted from BGR to RGB. And dataset is split into test dataset and train dataset using sklearn.model_selection(train_test_split).For training, the dataset will use Keras trained model called Sequential. And also will add Keras applications called EfficientNetB0. And also add Keras layers they are GlobalAveragePooling2D, Dropout, and Dense. The fit function is used for training sign dataset and trained model is saved in the format of .tflite.
Advances in Science and Technology Vol. 124
13
Hand Segmentation • The dataset was obtained via Kaggle. There is only one class in this dataset. All the data is in the format of the image which is saved in jpeg. It consists of 11,000 hand pictures from different users. • The hand dataset of all images is converted from BGR to RGB. The dataset is split into test datasets and train datasets in data preprocessing. The hand dataset is trained using a pre-trained model called SSD MobileNet v2 320x320 to predict the hand in the video. The fit function is used for training hand dataset and trained model is saved in the format of .tflite. Feature extraction • We will extract the image from the video in real time. • We will extract the features like shape and size of the input image and compare it with trained model. Labelling • We will extract the image from the real time video in the application. • After extracting the features of the hand we will compare the input sign with trained model and predict the gesture. • After predicting the alphabets and combining (in case of multiple alphabets) the application have a feature to read the word aloud. Text to Sign • We are providing text to sign feature in the application for the users who wants to learn sign language in their free time. • User will give an alphabet or word as an input and we will give the sign of each alphabet as output in the order of the word so that the user can learn the sign. Application Features • Add We provided Add button to add the detected letter. • Space We provided Space button to give space between the words when needed. • Clear Clear button works as backspace and we can use it when we capture a letter which is not needed in the word and it will clear the letter. • Read We are providing Read feature in the application after detecting and translating the sign to text. The text will be read out using read button in the bottom of the application.
14
IoT, Cloud and Data Science
Fig.1 System Architecture Algorithms For training the hand dataset SSD mobilenet is used which is a pre-trained model in tensor flow. SSD mobilenet is an object detection model that computes the frame and categorization of an item out of an input data. This Single Shot Detector (SSD) object detection model may offer quick object recognition optimized for smart phones using Mobilenet as a backbone. For training the sign dataset sequential model is used which is a pre-trained model in Keras. By building a Sequential class instance and adding model layers to it, you may construct deep learning models with the Sequential model API. In addition, the Keras application has been implemented in order to increase accuracy. In this project efficientNetB0 is used. EfficientNet-b0 is a convolutional neural ImageNet database was used to train a network on millions of snaps. This network can classify pictures in 1000 different class labels, like keyboards, mouse, pencils, and various species. And also global average pooling 2d, dropout, and dense are added to trained model. The Global Average Pooling 2d block evaluates the overall average among all entries given in tensor of length (input widths) x (input height) x (input channel). For each of the input channels throughout the whole (input width) x (input height) matrix (input channels). Dropout is a method for avoiding overfitting in a model. For each iteration of the training stage, Dropout works by setting all outgoing edge of hidden neurons (neural cells which comprise hidden layers) to 0. This dense layer is derived from the fact that each and every neuron inside this layer gets data from each and every neuron inside the preceding layer. Dense Layers are being used to recognize images utilizing convolution layers outputs.
Advances in Science and Technology Vol. 124
15
Fig .2 Sequence Model diagram Module 1
Module 3 Module 4 Input (Sign)
Sign Dataset
Data preprocessing
Trained model
Training
OpenCV
Android studio
Application
Predict the sign
Module 2
Hand Dataset
Data preprocessing
Training
Fig.3 Module Diagram of the system A. Dataset (sign): The dataset was obtained via Kaggle. This dataset comprises of 28 classes. All the data is in the format of the images which is saved in the .jpeg extension. And it has all the alphabets and as well as other classes such as space, delete, and blank. In each class, there are more than 1000 Pictures from different users. In total this dataset consists of 87,000 pictures. B. Data preprocessing (sign): In data preprocessing, the image is saved in a separate folder like A to Z and also each folder is equally divided into 1000 pictures for training the dataset and also the image of each class is taken only if the image is clear to view for the training.
16
IoT, Cloud and Data Science
C. Training (sign): In training, the sign dataset of all images is converted from BGR to RGB. And dataset is split into test dataset and train dataset using sklearn.model_selection(train_test_split).For training, the dataset will use Keras trained model called Sequential. And also will add Keras applications called EfficientNetB0. And also add Keras layers they are GlobalAveragePooling2D, Dropout and Dense. The fit function is used for training sign dataset and trained model is saved in the format of .tflite. D. Dataset (hand): The dataset was obtained via Kaggle. There is only one class in this dataset. All the data is in the format of the image which is saved in jpeg. It consists of 11,000 hand pictures from different users. E. Data preprocessing (hand): In data preprocessing, the image is saved in a separate folder hand dataset. And also each folder is divided into test and train datasets and also the image of each class is taken only if the image is clear to view for the training. F. Training (hand): In training, the hand dataset of all images is converted from BGR to RGB. The dataset is split into test datasets and train datasets in data preprocessing. The hand dataset is trained using a pre-trained model called SSD MobileNet v2 320x320 to predict the hand in the video. The fit function is used for training hand dataset and trained model is saved in the format of .tflite. G. Trained Model: The trained Sign dataset and trained Hand dataset will be in the format of .tflite. The two trained files will be used in the android studio for the detection of the hand and the sign. A trained hand dataset is used for detecting the hand to increase the accuracy of detecting the sign from the user through the app. H. OpenCV: OpenCV is installed in android studio to load the trained dataset and to take input from the user. The app will access the camera and the camera’s input in the form of the video goes to OpenCV where it converts to frame by frame and the frame is compared to the trained model to give an accurate alphabet. I. Android studio: Android studio is used to create the framework for the app and save the trained model and predict the gesture through the app. The app has a lot of features like to speech to text, and text to sign. The app is used to take the input from the user in the form of a sign and it predicts the sign and gives the accurate alphabet to the user and the user can convert to text to speech and it also has sign to the text where the user can learn the sign language. J. Input (sign): The user will give a sign as input to the application. K. Application: The application will take the input sign and will compare the input to the trained dataset and predict the sign. L. Predict the sign: It predicts the accurate sign to the user for the given input.
Advances in Science and Technology Vol. 124
Fig.4 GUI Application-1
Fig.5 GUI Application-2
17
18
IoT, Cloud and Data Science
Fig.6 GUI Application-3 Summary In this project we proposed a new mobile application that recognizes sign language using tensorflow keras algorithms in this project. The hand dataset was trained with SSD mobilenet, and the sign dataset was trained with a sequential model. We added a keras application called EfficientNetB0 to improve the sign dataset's accuracy. Keras layers, global average pooling 2d, dropout, and dense are also included. To communicate with the user, we created a mobile application in Android Studio. We have a Text to Sign tool in the application for individuals who don't know how to sign but want to learn and communicate with the Deaf and Hard-of-Hearing population. References [1]
WALAA ALY, SALEH ALY, AND SULTAN ALMOTAIRI “User-Independent American Sign Language Alphabet Recognition Based on Depth Image and PCANet Features.” Digital Object Identifier 10.1109/ACCESS.2019.2938829
[2]
MUNEER AL-HAMMADI , (Member, IEEE), GHULAM MUHAMMAD , (Senior Member, IEEE), WADOOD ABDUL, (Member, IEEE), MANSOUR ALSULAIMAN, MOHAMMED A. BENCHERIF, TAREQ S. ALRAYES , HASSAN MATHKOUR, AND MOHAMED AMINE MEKHTICHE “Deep Learning-Based Approach for Sign Language Gesture Recognition With Efficient Hand Gesture Representation” Digital Object Identifier 10.1109/ACCESS.2020.3032140
[3]
Sevgi Z. Gurbuz , Senior Member, IEEE, Ali Cafer Gurbuz, Senior Member, IEEE, Evie A. Malaia, Darrin J. Griffin, Chris S. Crawford , Mohammad Mahbubur Rahman, Emre Kurtoglu, Ridvan Aksu, Student Member, IEEE, Trevor Macks, Student Member, IEEE, and Robiulhossain Mdrafi, Graduate Student Member, IEEE“American Sign Language Recognition Using RF Sensing.” IEEE SENSORS JOURNAL, VOL. 21, NO. 3, FEBRUARY 1, 2021
Advances in Science and Technology Vol. 124
19
[4]
Hanjie Wang, Xiujuan Chai, Member, IEEE and Xilin Chen, Fellow, IEEE“A Novel Sign Language Recognition Framework using Hierarchical Grassmann Covariance Matrix.” IEEE TRANSACTIONS ON MULTIMEDIA, VOL. XX, NO. X, APRIL 2019
[5]
Pedro M. Ferreira, Diogo Pernes, Student Member, IEEE, Ana Rebelo, and Jaime S. Cardoso, Senior Member, IEEE “DeSIRe: Deep Signer-Invariant Representations for Sign Language Recognition.” IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS
[6]
MUNEER AL-HAMMADI, (Member, IEEE), GHULAM MUHAMMAD, (Senior Member, IEEE), WADOOD ABDUL, (Member, IEEE), MANSOUR ALSULAIMAN, MOHAMED A. BENCHERIF, AND MOHAMED AMINE MEKHTICHE “Hand Gesture Recognition for Sign Language Using 3DCNN” Digital Object Identifier 10.1109/ACCESS.2020.2990434
[7]
Daniel S. Breland, Simen B. Skriubakken, Aveen Dayal , Ajit Jha , Phaneendra K. Yalavarthy, Senior Member, IEEE, and Linga Reddy Cenkeramaddi , Senior Member, IEEE” Deep Learning-Based Sign Language Digits Recognition From Thermal Images With Edge Computing System” IEEE SENSORS JOURNAL, VOL. 21, NO. 9, MAY 1, 2021
[8]
Runpeng Cui, Hu Liu, and Changshui Zhang, Fellow, IEEE” A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training” DOI 10.1109/TMM.2018.2889563, IEEE Transactions on Multimedia
[9]
MUHAMMAD AL-QURISHI, (Member, IEEE), THARIQ KHALID, AND RIAD SOUISSI” Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues” Digital Object Identifier 10.1109/ACCESS.2021.3110912
[10] ZHIBO WANG, TENGDA ZHAO, JINXIN MA, HONGKAI CHEN, KAIXIN LIU, HUAJIE SHAO, QIAN WANG, JU REN” HEAR SIGN LANGUAGE: A REAL-TIME END-TO-END SIGN LANGUAGE RECOGNITION SYSTEM” DOI 10.1109/TMC.2020.3038303, IEEE TRANSACTIONS ON MOBILE COMPUTING. [11] HUSAM SALIH, LALIT KULKARNI, " STUDY OF VIDEO BASED FACIAL EXPRESSION AND EMOTIONS RECOGNITION METHODS " 2017 INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (ISMAC), 2017, PP. 692-696, DOI: 10.1109/I-SMAC.2017.8058267. [12] Marcin Kołodziej, Andrzej Majkowski, Remigiusz J. Rak, Paweł Tarnowski, Tomasz Pielaszkiewicz "Analysis of Facial Features for the Use of Emotion Recognition," 19th International Conference Computational Problems of Electrical Engineering, 2018, pp. 1-4, doi: 10.1109/CPEE.2018.8507137. [13] Murat Taskiran, Mehmet Killioglu, and Nihan Kahraman, "A Real-Time System for Recognition of American Sign Language by using Deep Learning" 2018 41st International Conference on Telecommunications and Signal Processing (TSP), 2018, pp. 1-5, doi: 10.1109/TSP.2018.8441304. [14] YANQIU LIAO, PENGWEN XIONG, WEIDONG MIN, WEIQIONG MIN, and JIAHAO LU, "Dynamic Sign Language Recognition Based on Video Sequence with BLSTM-3D Residual Networks," DOI 10.1109/ACCESS.2019.2904749, IEEE Access [15] JESTIN JOY, KANNAN BALAKRISHNAN, AND SREERAJ M,” SignQuiz: A Quiz Based Tool for Learning Fingerspelled Signs in Indian Sign Language Using ASLR” Digital Object Identifier 10.1109/ACCESS.2019.2901863
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 20-27 doi:10.4028/p-i494gi © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-28 Accepted: 2022-09-16 Online: 2023-02-27
SIGN BOT Extending an Ability to Communicate by Creating an Indian Sign Language Sampath Kumar S.1,a, Ajay Kumar V.2,b, Arun Nataraj S.3,c, Devasarathy B4,d, Hariharan B.5,e 1
Assistant Professor, 1,2,3,4,5 Department of Computer Science and Engineering, 1,2,3,4,5 Sri Eshwar College of Engineering, Coimbatore a
[email protected], [email protected], c [email protected], [email protected]; e [email protected] Keywords: Artificial Intelligence Mark- up Language, Indian Sign Language, RELU, OpenCV, MediaPipe, Softmax, and CNN.
Abstract-There is a communication lag betweendeaf-mutes and normal people. To overcomethat, we are providing information access and servicesto deaf-mute people in Indian Sign Language(ISL) and developing a flexible project that can be enlarged to capture the entire lexicon of Indian Sign Language via physical gestures like hand expressions and non-manual signs like facial expressions by developing and building a training model using machine learning algorithms. Sign language recognition uses image-based manual and non-manual gestures. Here we used figure recognition to identify manual and non-manual gestures. Finding expression gestures and analyzing finger movements to determine what the deaf-dumb individual is saying. In Python, the MediaPipe recognizes the hand signs and facial gestures of a person. These modules were developed to assist people with non-identical motions. This paper presents figure identificationof Indian Sign Language via hand and facialgestures, as well asits integration with a chatbot as transcript output. I. Introduction A sign language institute was advocated by the Indian deaf community in the 2000s. The Five-Year Plan (2007-2012) acknowledged that the needs of people with hearing disabilities had been relatively neglected and proposed the development of a sign-language institute. Sign language is a language of gestures that are done with the help of the face, posture, and hands. These gestures have their own way of expressing feelings and conversation. By following these expressions, deaf-mute people can be able to convey messages with feelings and information too. Deaf and Dumb person can get in touch with the assistance of sign language. Normalpeople are using voice as a search method, and this technology improves people’s thoughts and makes them more comfortable. So our team has proposed the Indian Sign Language recognition model. The Chatbot is a type of chat software that will be used in our system to help people with the output of audio and text visualization. Our project deals withIndian sign language as the output part. The output we are getting is displayed by image processing, and this will be integrated with thechatbot and flask app so the deaf-mute people can communicate and share the information. II. Methodology To achieve the task of our project, the firstdatasets have to be collected or created. In our case, we are creating the dataset with the help of OpenCV and MediaPipe through the camera. With the help of MediaPipe, we captured manual and non-manual gestures and stored them as a dataset. In this project, TensorFlow is used to develop the Indian sign language recognition model. This model is developed and trained with the help ofthedataset created. Our team will design the chatbot using AI/ML from which the text is displayed based on the physical gestures and non-physical signs captured by the camera and validated using Indian sign language recognition that has been developed.
Advances in Science and Technology Vol. 124
21
Fig. 1. Some basic hand gestures The above Fig.1 shows the alphabets and theircorresponding gestures in ISL. We have set the gesture images with this paragraph explainingsome of the signs in Indian Sign Language. III.
Indian Sign Language Recognition Model
To create a dataset, we use OpenCV to capture theimage and MediaPipe to detect landmarks and extract key points from the gesture. MediaPipe is aGoogle open-source library for constructing intermodal apps such as visual, audial, and time-series data across platforms like android, ios, and the web as well as on edge devices using machine learning pipelines. It is performance optimized andsupports end-to-end device inference. OpenCV (Open Source Computer Vision Library) is a framework of programming functions targeted at the actual time of computer vision, and it's generally used for image analysis, image processing, etc. After detecting landmarks andextracting the key points, itis made into a NumPy array and stored as a dataset. A NumPy array is a tuple of non-negative integers that is of the same type and is a grid of values. Pre-processing the data and creating labels and features is done, the training process is initiated, and we are using the sequential model where there is only one input tensor and oneoutput tensor. By using Long Short-Term Memory, whichis a Convolutional Neural Network. It is used to classify, process, and make predictions based on data. It is a dense convolution layer. Neuronal organizations, also called Artificial Neural Networks (ANN), are artificial systems that arebased on natural neural networks for performing a particular task. Neural networks perform their functions by changing the estimates of the associations. These are the layers added to the sequential layer. In this model, we add activations like RELU (Rectified Linear Unit) and softmax to increase the non-linearity of images by decreasing the linearity of images so that they can be easily separable. Images are highly non-linear becausethere are so many details contained, such as intensity, borders. Formula for Softmax
Defined as S - softmax. x – for the input tensor.
22
IoT, Cloud and Data Science
ex i - exponential function of tensor input. n – number of classes in multi-class ex j - exponential function for the tensor outcome. After the softmax, we are using the sequential model in our project, it is a minor pile of levels. Every level has one input tensor and one output tensor. After completion of training, predictions are made and the weights of the model are savedin Hierarchical Data Format. Formula for RELU Mathematically, RELU is defined as Y = max(0, x)
Fig. 2. Graphical Representation In our project, the softmax function is also used, which is the initial function in the output level of the models that predict the multinomial probabilitydistribution. IV. Implementation A. Figure Truncation: From the input figure, metadata are collected which exhibits motionless signals. Though, the figure also contains segments of the frame and face in addition to gestures. B. Figure Accretion: This is fulfilled in the course ofpedagogy by executing the figures with minor changes. It incorporates functions such as truncating and altering figures. C. Splitting Figures: When the figures are recognized, the figures should be fragmented for pedagogy. It is satisfying to have control over how many figures should be qualified for the prototypical. And finally, it is added to the pedagogy set. D. Testing: The process chart below demonstrates that the final step of execution is examining figures on the trained prototype to find out the result based on the image. Image data is read sequentially. In this image reading process, TensorFlow is used to build a model based on the sequential model, with RELU (Rectified Linear Unit) and softmax as layers to read handwritten images. After testing is done, the results are displayed.
Advances in Science and Technology Vol. 124
23
V. Flowchart
VI. Discussion The following images represent the point mapping and image recognition of the face and hand gestures of a person. We then trained those pictures using Indian Sign Language to capture the image.
Fig. 3. Extracting Key Points
24
IoT, Cloud and Data Science
After this, the images will be captured for theprocess of recognition. So we have also captured the process of capturing the image in the dataset.
Fig.4. Input Processing Images are collected in the form of frames, and these frames are recognized using landmark detection and keypoint extraction. After the recognition, the result will be displayed based on the manual gestures and non-manual ones shown by deaf-mute people. VII.
Output
Once the user shows one as the gesture, the gestureis captured as several frames, and the result is displayed as fig.6 shown below,
Fig. 5. Left-Hand Detection
Fig 6. Output with Accuracy
Advances in Science and Technology Vol. 124
25
Fig. 7. Right-Hand Detection The figure above also shows a gesture indicating the user also shows the number two on the console. There are more outputs than in a document to elaborate on the project we had done using the software, libraries, etc. In the case where the user indicates that he is happy in the sign form, the output will be in the form of text for the equivalenthappy VIII.
Validation Accuracy
The validation accuracy of our project occurs in therange of 90% to 97%. As displayed in the outputs. Model: Sequential_1 Layer(type)
Output Shape
Param #
lstm_3 (LSTM)
(None, 30, 64)
48896
lstm_4 (LSTM)
(None,30,128)
98816
lstm_5 (LSTM)
(None, 64)
49408
lstm_3 (LSTM)
(None, 64)
4160
lstm_4 (LSTM)
(None, 32)
2080
lstm_5 (LSTM)
(None, 4)
132
The Accuracy of the Developed Model 40/40 based on test data Evaluation 1s 8ms/step - loss: 0.0400 – categorical accuracy: 0.9750 Test loss Test accuracy
0.03996208682656288 0.9750000238418579
Generate predictions for 30 samples Predictions’ shape: (40, 4) Git-hub link for ISL Dataset https://github.com/ArunNataraj/IndianSignLanguageDataSet IX. Existing System The existing system for American Sign Language/Indian Sign Language detection containsonly hand detection, and there are only a few detection systems that provide both face and hand detection.
26
IoT, Cloud and Data Science
X. Conclusion and Future Work This paper proposes that for Indian Sign Language, further enhancements should be made using the Flask App, incorporating a video call function. And this project is now implemented for Indian Sign Language, having been expanded to other sign languages and for video input. Through the application of AI/ML, Tensorflow, OpenCV,MediaPipe, Softmax, RELU, and Sequential models, we were able to develop a translator for Indian Sign Language with an accuracy of 90% to a maximum accuracy of 97% using a trained dataset. References [1]
Divya, K. S., Subha, R., & Palaniswami, S. (2014). Similar words identification using naive and tf-idf method. International Journal of Information Technology and Computer Science (IJITCS), 6(11), 42.
[2]
Subha, R., & Palaniswami, S. (2013, January). Quality factor assessment and text summarization of unambiguous natural language requirements. In International Conference on Advances in Computing, Communication and Control (pp. 131- 146). Springer, Berlin, Heidelberg.
[3]
Subha, R., & Palaniswami, S. (2013). Ontologyextraction and semantic ranking of unambiguous requirements. Life Science Journal, 10(2), 131-8.
[4]
S Sampath Kumar & R. Rajeswari, “An Automated Crop and Plant Disease Identification Scheme Using Cognitive Fuzzy C-Means Algorithm”, IETE Journal of Research, 2020.
[5]
S Sampath Kumar & R. Rajeswari, “Heuristic Optimization Using Gene Navigation with the Gravitational Search Algorithm”, Journal of Electrical Engineering, Vol. 19, No. 2, 2019.
[6]
A Anandaraj, PJA Alphonse,"Tree based Ensemble for Enhanced Prediction (TEEP) of epileptic seizures", International Journal of Intelligent Data Analysis,Volume 26,Issue 1,Page 133-151, IOS Press,1/1/2022.DOI: 10.3233/IDA-205534
[7]
A. Anandaraj, P. Yeshwanth Ram, K. Sri Ram Kumar, M. Revanth and R. Praveen, "Book Recommendation System with TensorFlow," 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), 2021, pp. 1665-1669, doi: 10.1109/ICACCS51430.2021.944192
[8]
D Thiyagarajan, N Shanthi, A modified multi objective heuristic for effective feature selection intext classification, Cluster Computing 22 (5), 10625-10635
[9]
D Thiyagarajan, N Shanthi, S Navaneethakrishnan, A Web Search Engine-BasedApproach To Measure Semantic Similarity Between Words, International Journal of AdvancedEngineering Research and Studies. E-ISSN2249– 8974
[10] S Yuvaraj, M Krishnamoorthi,"A novel hybrid optimization algorithm for data clustering", International Journal of Computer Applications, 2013. [11] B Hemalatha, S Yuvaraj, KV Kiruthikaa, V Viswanathan,"Automatic Detection of Lung Cancer Identification using ENNPSOClassification", International Conference on Advances in Computing and Communication Engineering (ICACCE),2019 [12] Kiruthikaa K.V, Vijay Franklin J, Yuvaraj S, "Analysis of Prediction Accuracy of Heart Diseases using Supervised Machine Learning Techniques for Developing Clinical Decision Support Systems", International Journal of Recent Technology and Engineering, (IJRTE), 2018.
Advances in Science and Technology Vol. 124
27
[13] Prathiba, E Angel Anna; Saravanan, B; ""HASBE for Access Control by Separate Encryption/Decryption in Cloud Computing” “International Journal of Emerging Trends in Electrical and Electronics (IJETEE) Vol: 2, Page: 66-71, 2013 [14] Selvapriya, S; Saravanan, B,""Diagnosis of Brain Tumor Using Chemical Shift Displacement Correction"", IJIRCCE, 2014" [15] G. Karthikeyan, Dr G. Komarasamy, Dr. Daniel Madan Raja, "An Efficient Method for Heart Disease Prediction Using Hybrid Classifier Model in Machine Learning", Annals of the Romanian Society for Cell Biology, ISSN:1583- 6258, Vol. 25, Issue 4, 2021, Pages. 5708 5717 [16] G. Karthikeyan, "A Cross-Sectional Survey On Parallel Data Partitioning Clustering Algorithms", International Journal on Applicationsin Engineering and Technology, Volume 3: Issue 3:pp. 17 – 21, 2017.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 28-36 doi:10.4028/p-5s1927 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-07 Accepted: 2022-09-16 Online: 2023-02-27
RepliCare: Real-Time Human Arms Movement Replication by a Humanoid Torso Vivek Jagannath1,a∗ , Shushrut Kumar,2,b , and Dr. P. Visalakshi3,c Department of Networking and Communications, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India a
[email protected], b [email protected], c [email protected]
Keywords: Pose Estimation, Robot Telekinesis, Humanoids
Abstract. Robotics is a field that has actively been working to reduce the involvement of humans in dangerous environments by automating tasks. In this paper, we propose a method to be able to remotely control a humanoid torso in a telekinetic method. The humanoid torso replicates the pose of the person controlling it remotely by detecting the pose of their arms through an input from an RGB camera in real-time using computer vision techniques based on machine learning algorithms. By detecting the pose, the humanoid’s joints (shoulders and elbows) are positioned to replicate the pose of the person controlling it. This task is achieved by mapping the positions of the joints of the person controlling the robot to a set of equations using vector algebra. Such a system ensures that the movements executed are not only oriented to the end-effector reaching the desired location, but it also ensures that the position of every part of the robot can be controlled to move in the required manner. This level of control eliminates the complexities of collision detection in teleoperated robotic systems and also increases the range of applications such a system can be used in efficiently. Introduction There are many tasks handled daily by humans that involve harmful environments and activities, such as working in mines, environments with chemical atmospheres or even around explosives in the defence departments. Carrying out such tasks manually causes many causalities and a lot of research in robotics is done to move the humans out of such situations and automate those tasks. When trying to achieve this goal of replacing humans with robots and expecting them to work seamlessly, there are many variable factors that are to be considered. We need control of the exact position of every part of the robot and it needs to be accurate and seamless. This project enables a person to control the robot from a safe location and keeps them away from performing dangerous activities. At the same time, it also gives the flexibility of various movement ranges based on how the human controls the robot to move. The applications of using such a system are endless as it is extremely flexible, thus helping achieve the goal of replacing humans in dangerous environments with robots. Teleoperation has been a topic of research in robotics for a while where the goal has always been to be more accurate and efficient with the system. Multiple technologies are used to make such a system a reality including networking, robot designing, control parameters and motion planning. In this project, with the added complexity of replication of human movements, we also introduce Computer Vision (CV) to the system. Related Work The proposed work uses technologies involving computer vision, modelling a robot and simulation. Merging all these technologies, we were able to develop a system where the designed humanoid torso would be able to replicate the pose demonstrated by a person standing in front of an RGB camera. We can see various approaches used previously to develop the state-of-the-art for each of these technologies as we read through the references.
Advances in Science and Technology Vol. 124
29
Through the work in [1], we saw an approach to detect the 2D pose of multiple people in an image. The method proposed uses nonparametric representation called Part Affinity Fields (PAFs). Using these PAFs, they find relations between different detected key-points and thus help analyse the complete pose of the person in an image. The method uses a bottom-up approach and runs with very high accuracy and minimal computational cost. [2] proposed a DeepPose-based pose estimation system to represent pose detection with a dynamic bounding box that changes the dimensions of the bounding box for varying images. To achieve this, they linked a person detection system along with a pose estimation system. They introduced the Bounding-box Curriculum Learning (BCL) and Recurrent Pose Estimation (RPE) to link these systems. In [3] we found a new implementation of the bottom-up approach where they presented a box-free system. This system was used for estimating the pose of multiple people in an image. They used part-based modelling to associate different parts of the person. One resource that helped us learn a lot more about the various methods used to make the pose estimation module was through [4] where they took a survey of the implemented techniques for pose estimation by comparing the methods used in various implementations and evaluated the computing cost, accuracy and speed in detection for each of the methods surveyed. The authors also elucidated the theory and notations behind the implementations. The authors of [5] proposed a different method for human pose estimation by developing it based on Deep Neural Networks (DNNs). They were able to achieve results with high accuracy in detecting the pose of a person by approaching it as a regression problem based on DNN. Along the same line of research, in [6], the authors presented a residual learning framework to assist in training neural networks that consist of many layers. They showed that residual networks can be optimized easily and they also showed that these networks can gain accuracy when their depth is increased. Similarly in [7], the authors explored the influence of convolutional network depth on accuracy in the context of large-scale picture recognition. The key contribution is a detailed examination of networks of increasing depth utilising an architecture with extremely tiny convolution filters, which demonstrates that expanding the depth can result in a considerable improvement over prior-art setups. This method also acts as the backbone for many of the methods explored for human pose estimation. Further on convolutional pose machines, the authors of [8] outline a systematic strategy for incorporating convolutional networks into the pose machine framework for learning image characteristics and image-dependent spatial models for posture estimation. This study makes a contribution by implicitly modelling long-term relationships between variables in structured prediction problems such as articulated pose estimation. A different approach for pose estimation is shown in [9] where the authors, to ease posture estimate in the presence of imprecise human bounding boxes, a novel framework called regional Multi-Person Pose Estimation (RMPE) was proposed. The framework is made up of three parts: the Symmetric Spatial Transformer Network (SSTN), the Parametric Pose Non-Maximum-Suppression (NMS), and the Pose-Guided Proposals Generator (PGPG). The authors of [10] highlight how algorithm analysis and comparison get increasingly challenging as the algorithm and system complexity increases. This study gives easy and effective baseline approaches for motivating and assessing new ideas in the field and discusses evaluation approaches to achieve state-of-the-art results. Among methods [11] developed a two-DOF humanoid torso. The “Up to down” sequence was used while designing the humanoid robot design by analyzing human torso motion. They analysed human torso motion to be able to replicate the features accurately in their model and tried to improve the feasibility of the design. In [12] we see techniques for simulating robot models and assessing their feasibility. Based on Gazebo, this study presents a way for robotic arms in a realistic physical environment. The parallel-axis theorem
30
IoT, Cloud and Data Science
is used to calculate the moment of inertia of linkages. The system’s most critical components are the joint controllers and the path planning algorithm. Because of its ease of hardware abstraction, ROS was chosen to arrange task architecture. The simulation system can analyse and compare the reactions of the robotic arm to different algorithms and target poses. All the work mentioned above constituted the work for all the individual modules used in the proposed system. There has been some work to use all these modules and trying to create systems that exhibit robotic telekinesis previously as shown in [13], [14], [15] and [16]. In [13] we see the authors take 29 key-points using computer vision techniques of the hand of a person and they try to understand a method to find the angles of all the joints in the detected pose of the hand. They further tried to grasp objects using adversarial imitation learning. Further in the same line, we see the work by [14] where the authors develop a metric system to judge the imitation performance of a system. They explore the problems of what to imitate and how to imitate in discriminative imitation systems. A more thorough implementation of robot telekinesis can be found in [15] where the authors develop a system to make a robotic arm’s end-effector imitate the gestures made by a man through four depth cameras. These cameras help figure the pose of the hand of the person and then the system maps coordinates to the robotic arm whose pose is determined using kinematic retargeting. Similarly, we see the work done in [16] where the authors develop a system that reads the pose of each finger of a persons hand using deep convolutional neural networks. They go through various orientations in which the pose of the hand can be set to train the system. They then tried to figure out equations that would help use the traced pose and replicate it with a robotics hand. Through this work, they were eventually able to make a system where a robotic hand would be able to replicate the pose of a person’s hand. Methodology Taking into consideration the number of modules involved to get the desired output in this project, which are also supplemented by the challenges that can be faced in each of the parts, it is essential to find the right method to implement each of the modules in the best possible manner for the system to run smoothly with the number of errors being to the absolute minimum. It is important that we explore various implementations for each of the modules so as to understand the approach that complements all the other modules in the best manner possible. We can split the methodology in the proposed system into three modules, namely, the pose estimation, robot modelling and motion planning modules. All these modules were integrated together using Robot Operating System (ROS). One node is dedicated to the pose estimation module which publishes coordinates of key-points detected in the module to a ROS topic. The coordinates from the ROS topic are then subscribed by the motion planning node which performs the required calculations to estimate the angles by which each of the joints of the robot is to be actuated. Pose estimation. In this module, we focus on retrieving the positions of each of the joints of the person standing in front of the RGB camera. To achieve this, we used Computer Vision (CV) techniques based on Machine Learning algorithms. Using these techniques we try to detect the elbows, wrists, shoulders and the bottom of the neck of the person standing in front of the camera. These detected joints will be referred to as pose key-points. Knowing the positions of these specific keypoints helps with calculating angles by which each of the actuators of the robot should be moved, which would result in the robot replicating the pose demonstrated by the person standing in front of the camera. This phase is extremely intensive in computing, thus, optimising it in the best possible manner is extremely crucial. One important metric to consider is the number of frames the module can process in one second. This is called the frame rate. There is a need to ensure that the frame rate of the feed is high to have a more continuous stream of commands sent to the robot, resulting in real-time replication.
Advances in Science and Technology Vol. 124
31
Fig. 1: Pose estimation output Multiple methods were tested to find the optimal technique for finding the key-points while getting the maximum possible frame rate with minimal computing. There are two general approaches to estimating the pose of a person from an image. These are called the top-down and the bottom-up approaches. Though the output for both these approaches is similar, they vary in computational cost and complexity. Thus, the frame rate attained using both these approaches vary. We explored both these approaches to see which system runs in real-time as required for the use case of the proposed system. We found the bottom-up approach was ideal as it was quicker and delivered higher frame rates. We took inspiration from the approach used in [1] for the model trained. We generated two tensors from the neural network, namely, the heatmap of the key-points detected and the Part Affinity Fields (PAF) to detect the relation between two given detected joints. The input images are analysed using a convolutional neural network which consists of the first 10 layers of the VGG-19 architecture as proposed in [8]. Through each of the stages in the network, confidence maps and PAF sets are formed and through each stage, the findings of the previous stages are joined to be able to generate the best possible predictions. In the implementation in [1], the network is refined through five stages after an initial stage. When analysing the output achieved in [1] on the validation dataset through each of the refinement stages, we observe that there isn’t a very significant change in accuracy between the output of the first refinement stage and the fifth.
32
IoT, Cloud and Data Science
Thus, for optimizing the output to get a higher frame rate, we only included the initial stage and the first refinement stage in our implementation. This helped give us a much higher frame rate with minimal compromise on the output. Having a system with minimal frame drops is essential in the proposed system to ensure that the most recent command is not sent any later. This would enable quick movement transitions and give the ability to dodge dangerous scenarios for the robot being operated remotely. It also gives the flexibility of working in dynamic environments efficiently. Robot model. In this segment, we gave careful consideration to how the designed robot model would be fabricated and thus what types of actuators would be best suited and which angle of rotation each of the joints will have. Since we were dealing with two types of joints in this project, we experimented with different actuators to finalise which one would help in making both types work.
Fig. 2: Robot model We implemented revolute joints for each of the actuators. This would mean the joints will be substituted with servo motors when fabricated as they would provide the flexibility of controlling the angle of each rotational axis for the joints. As mentioned earlier, we are dealing with two types of joints in this project, the shoulder and the elbow. To model the shoulder, we placed three revolute joints. One for each axis of rotation. Thus, this ensured we got the entire range of motion of a ball and socket joint which the shoulder is. With this design, we are able to manipulate the arm in all rotational axes including the x (roll), y (pitch) and z (yaw) axes. In a similar manner, to implement the model for the elbow, we used only one revolute joint rotating in only one rotational axis which would allow the arm to contract and retract. The elbow is a hinge joint with only one axis of movement thus this was the ideal manner to implement this joint. Designing the robot in this manner helped us replicate the joints in the best manner possible. Designing in this manner also ensures the pose of the robot is exactly as demonstrated by the human so that the person controlling the robot has a lot more control. Motion planning. In this segment, we collect coordinates of different key points as determined by the pose estimation segment to calculate the angles for each of the actuators in the robot model. The eventual output out of the CV segment would ideally be what is shown in Figure 3. Each of the joints of the person demonstrating the pose with be read in terms of their coordinates relative to the entire picture.
Advances in Science and Technology Vol. 124
33
Fig. 3: Pose coordinates To find the angle between the vectors we plotted, we first need to find the values of each of the vectors in terms of their coordinates, x and y, relative to the origin. This can be done simply by finding the difference between the coordinate marking the end of the vector and the coordinate acting as the origin of the vector as shown below where we calculate the new values (xa , ya ) from (x0 , y0 ) and (x1 , y1 ). xa = x0 − x1
(1)
ya = y0 − y1
(2)
In a similar manner (xb , yb ) can be calculated by taking the difference between (x2 , y2 ) and (x1 , y1 ). We can then calculate the magnitude of each of these vectors with these values. The magnitude of these vectors are essential, as they can be inserted into the equation of dot products of two vectors, which in turn would assist us get the value of θ. Calculating the magnitude of the vectors after finding the coordinates relative to the origin of both the vectors is fairly straightforward. Equation (3) would help us calculate the the magnitude of the vector between (x0 , y0 ) and (x1 , y1 ), let’s call it ~a. |~a| =
p x2a + ya2
(3)
Similarly, we can calculate the magnitude of ~b too by substituting the coordinates xb and yb in place of xa and ya in the equation above. It is important we remember the equation of calculating the dot product of two given vectors at this point. For the vectors ~a and ~b that we calculated above, the dot product would be calculated as represented in Equation (4). ~a.~b = (xa ∗ xb ) + (ya ∗ yb ) (4) But this equation does not help us calculate the angle between the vectors. To find that we need to see all the values we already have and substitute it in a different equation. When given two vectors ~i and ~j, the dot product of these vectors can be calculated with the equation of a dot product which is shown in Equation (5). ~i.~j = |~i| ∗ |~j| ∗ cos(θ)
(5)
If we substitute ~a and ~b in Equation (5), we would get Equation (6). ~a.~b = |~a| ∗ |~b| ∗ cosθ
(6)
34
IoT, Cloud and Data Science
From Equation (6), the values of the dot product of ~a and ~b are known and the magnitudes of ~a and ~b are also known, so we can substitute their values to get cosθ like how Equation (7) represents. cosθ =
~a.~b |~a| ∗ |~b|
(7)
Thus, from Equation (7), we can easily calculate the angle θ as shown in Equation (8). θ = cos−1
~a.~b |~a| ∗ |~b|
(8)
Using the above equation, we can easily determine the angle by which the person demonstrating a pose to the robot has moved each of their joints. Simulation
Fig. 4: Simulation nodes This is the phase where all the modules of the project are linked together using the ROS network stack and check if the outputs are satisfactory. The CV module reads the pose from a live video feed and forwards the coordinates of both the wrists, elbows and shoulders of the person standing in front of the RGB camera and the bottom of the neck or the centre of both the collar bones. The centre of the collar bone is noted to be able to calculate the angle by which the shoulder moves. All these steps are executed by the CV module really quickly and the outputs are expected to be released to the angles calculating node at least 23 frames per second. Once the coordinates of all the required key-points are sent to the angles calculating module. The angles calculating module then retrieves the coordinates from the ROS master and substitutes the required values into the equations derived. The output obtained from the equations is then sent to the required ROS topic to control each of the revolute joints. Each revolute joint subscribes to a different ROS topic thus ensuring the communication is seamless and there are no clashes. This also ensures that the system runs even when one actuator malfunctions. This is important because it gives the ability to control the robot even when the functionality is limited. Figure 5 shows the output in the simulation along with the pose detected by the CV module and shows the robot replicating the pose in an accurate manner.
Advances in Science and Technology Vol. 124
35
Fig. 5: Simulation of robot replicating pose demonstrated Conclusion We present RepliCare, a system that can assist humans by acting as a substitute for them being present in dangerous environments. With the help of the various modules used in the methodology for this system, which include the pose estimation, robot modelling and motion planning modules, we were able to create a system that would make a humanoid robot with the specifications described in this paper accurately replicate the poses demonstrated by a person through a video feed. In the proposed system, an input stream from a camera was taken and fed to the pose estimation module to help calculate angles by which each of the joints of the humanoid were to be moved to replicate the pose demonstrated by the person standing in front of the camera accurately. The flexibility of having separate modules in this system communicate efficiently with each other using Robot Operating System (ROS) opens the possibility of controlling the humanoid from remote locations too. This helps achieve the goal of the system which is to eliminate the involvement of humans physically in dangerous environments and substitute them with robots. The fact that the detection and computing of the pose happens in real-time with a continuous stream of commands being sent makes the system extremely efficient and also makes quick changes and adaptations in movements feasible. References [1] Z. Cao, G. Hidalgo, T. S.-S. E. W. and Sheikh, Y. (2021). “Openpose: Realtime multi-person 2d pose estimation using part affinity fields.” IEEE Transactions on Pattern Analysis and Machine Intelligence. [2] Go, R. and Aoki, Y. (2016). “Flexible top-view human pose estimation for detection system via cnn.” IEEE 5th Global Conference on Consumer Electronics. [3] George Papandreou, Tyler Zhu, L.-C. C. S. G. J. T. K. M. (2018). “Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model.” 15th European Conference on Computer Vision (ECCV), AITS.
36
IoT, Cloud and Data Science
[4] et. al., T. L. M. “The progress of human pose estimation: A survey and taxonomy of models applied in 2d human pose estimation.” IEEE Access, vol. 8. [5] Alexander Toshev, C. S. (2014). “Deeppose: Human pose estimation via deep neural networks.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR). [6] Kaiming He, Xiangyu Zhang, S. R.-J. S. (2016). “Deep residual learning for image recognition.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR). [7] Simonyan, K. and Zisserman, A. (2015). “Very deep convolutional networks for large-scale image recognition.” 3rd International Conference on Learning Representations (ICLR). [8] S. Wei, V. Ramakrishna, T. K. and Sheikh, Y. (2016). “Convolutional pose machines.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR). [9] Hao-Shu Fang, Shuqin Xie, Y.-W. T. C. L. (2017). “Rmpe: Regional multi-person pose estimation.” IEEE International Conference on Computer Vision (ICCV). [10] Bin Xiao, H. W. and Wei, Y. (2018). “Simple baselines for human pose estimation and tracking.” 15th European Conference on Computer Vision (ECCV). [11] Baoshi Cao, Kui Sun, M. J.-C. H. Y. Z. H. L. (2016). “Design and development of a two-dof torso for humanoid robot.” International Conference on Advanced Intelligent Mechatronics (AIM). [12] Z. Huang, F. L. and Xu, L. (2020). “Modeling and simulation of 6 dof robotic arm based on gazebo.” 6th International Conference on Control, Automation and Robotics (ICCAR). [13] Dafni Antotsiou, Guillermo Garcia-Hernando, and Tae-Kyun Kim (2018). “Task-oriented hand motion retargeting for dexterous manipulation imitation.” European Conference on Computer Vision (ECCV). [14] Aude G Billard, Sylvain Calinon, and Florent Guenter (2006). “Discriminative and adaptive imitation in uni-manual and bi-manual tasks.” Robotics and Autonomous Systems, 54 (5):370–384. [15] Ankur Handa, Karl Van Wyk, Wei Yang, Jacky Liang, Yu-Wei Chao, Qian Wan, Stan Birchfield, Nathan Ratliff, and Dieter Fox (2020). “Dexpilot: Vision-based teleoperation of dexterous robotic hand-arm system.” IEEE International Conference on Robotics and Automation (ICRA), pages 9164–917. [16] Shuang Li, Xiaojian Ma, Hongzhuo Liang, Michael Gorner, Philipp Ruppel, Bin Fang, Fuchun Sun, and Jianwei Zhang (2019). “Vision-based teleoperation of shadow dexterous hand using end-to-end deep neural network.” International Conference on Robotics and Automation (ICRA), pages 416–422.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 37-43 doi:10.4028/p-5d1g8v © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-09 Accepted: 2022-09-16 Online: 2023-02-27
Brain Tumor Detection Using Deep Learning Jairam S.J.Aa, Lokeshwar Db, Divya Bc, Dr. P. Mohamed Fathimald SRM Institute of Science and Technology, Vadapalani Campus , India
a,b,c,d
[email protected], [email protected], [email protected], *d [email protected]
a
Keywords: Medical image, brain tumor, classification, deep learning-based tumor classification, image segmentation
Abstract. Brain tumors are developed as a result of unregulated and fast cell proliferation. It may result in death if not treated in the early stages. The imaging technology used to diagnose brain tumors is known as magnetic resonance imaging (MRI). Early detection of brain tumors is critical in medical practise in order to determine whether the tumor will progress to malignancy. For picture categorization, deep learning is a useful and effective method. Deep learning has been widely used in a variety of sectors, including medical imaging, because its application does not necessitate the expertise of a subject matter expert, but does necessitate a large amount of data and a variety of data in order to produce accurate classification results. The deep learning technique for image categorization is the convolutional neural network (CNN).In this research work, two different models are used to categorize brain tumors and their results were evaluated using performance metrics like accuracy and precision and the results were impressive 1. Introduction The central nervous system distributes sensory information and its corresponding actions throughout the body. The brain and spinal cord contribute to this dissemination. A normal human brain weighs approximately 1.2–1.4 kilograms and has a volume of approximately 1260 cm3 (male brain) and 1130 cm3 (female brain) (female brain). The frontal lobe of the brain contributes to problem-solving, motor control, and judgement. The role of the parietal lobe is to maintain body posture. The temporal lobe is responsible for memory and hearing, while the occipital lobe is responsible for visual processing. The grey material that surrounds the cerebrum and is composed of cortical neurons is the cerebral cortex. The cerebellum is a smaller region of the brain than the cerebral cortex. Its function is motor control, or the systematic management of voluntary movements. In comparison to other species, the human cerebellum is well-structured and developed. The cerebellum has three lobes: anterior, posterior, and flocculonodular. The anterior and posterior lobes are connected by a circular structure known as the vermis. The cerebellum is composed primarily of white matter and a thinner, grey outer cortex than the cerebrum. The anterior and posterior lobes aid in the coordination of complex motor movements. The flocculonodular lobe maintains the body's equilibrium [4, 8]. The brain stem contains cranial and peripheral nerves that aid in eye movement and control, balance and maintenance, as well as some fundamental functions, such as breathing. Nerves originating in the thalamus of the cerebrum travel to the spinal cord via the brain stem. They spread from there to the rest of the body. The three primary portions of the brain stem are the midbrain, pons, and medulla. The pons helps with brain communication, senses, and breathing, whereas the medulla oblongata helps with blood management, sneezing, swallowing, and other functions. Brain tumors can be categorized as either aggressive or slow-growing. A benign or slowly progressing tumor will not affect nearby tissues, whereas an aggressive malignant tumor will spread from one location to another. The World Health Organization groups brain tumors into categories I through IV. Grades I and II tumors are considered slow-growing, whereas grades III and IV tumors are considered aggressive and typically have a poorer prognosis.
38
IoT, Cloud and Data Science
In this work, tumors as such as Glioma Tumor, Meningioma Tumor, Pituitary Tumor and normal cells are classified. If no tumor is present, no tumor. Brain Imaging Techniques: Medical imaging is essential for the visualization of interior organs in order to discover anomalies in that anatomy and function. Some medical image capturing instruments, such as X-ray, PET, MRI scans, CT scans and ultrasound scanners, record the anatomy or function of inside organs and display them as images or films. Images and videos must be comprehended in order to accurately detect anomalies or diagnose functional impairments. If an abnormality is discovered, the accurate location, size, and form should be established. Traditionally, experienced physicians used these techniques based on their experience and judgment. These tasks are intended to be done by automated healthcare systems using intelligent medical picture comprehension. Medical image detection, segmentation, localization and classification are critical tasks in medical image processing. 2. Literature Study Javaria Amin et.al [1] provides a quick overview of the many approaches available for detecting and classifying brain tumors. It addresses machine learning techniques such as KNN, SVM, and others, as well as deep learning approaches such as CNN and transfer learning. A comparison of different approaches was made and the survey examined the advantages and disadvantages of each approach. The survey also identified the best approach and provided a brief difference in the accuracy rates of these models. For the most part, the strategies outlined in this study provided accuracy ranging from 70% to 90%. However, it does not go into great detail regarding the models' attributes. Using a convolutional neural network, Emrah Irmak [2] aimed to multiclass brain cancers for early diagnosis (CNN). For three distinct objectives, three different CNN models are proposed. The hyperparameters in CNN were optimized using a grid search technique. In this study, deep CNN models are widely used for each task separately. The models suggested in this study are not lightweight, and they demand a lot of computational power and training time. G. Hemanth et.al [3] proposes an approach using automated segmentation based on Convolutional Neural Networks, that determines small 3 x 3 kernels. Segmentation and classification are performed by combining this single approach. CNN (a machine learning approach) differs from NN (Neural Networks) in that it is layer-based for result interpretation. Shirwaikar et al [4], looks at the effectiveness of employing 3D CNNs to identify brain tumors. Due to the complexities of 3D CNN production, the advancements may lead to the automated recognition of significant characteristics without supervision. Among the survey studies in this paper, a 3D CNN architecture is constructed to extract tumors, and the retrieved tumors are sent to a pretrained CNN model for feature extraction. As a result, these characteristics are transmitted to the correlation-based selection process, which selects the best features. A. Miglani et al [5], present a detailed guide to Brain Tumor Detection, which focusses especially on its segmentation and classification, by comparing and summarizing the most recent research work on this topic. This study compared 28 peer-reviewed papers and emphasized the various methodologies. Sinha et al [6] pinpoints the erroneous image and tumor area in the brain. It also identifies the density of segmented tumor, which may be approximated using the mask. To detect abnormalities in MRI scans, a deep learning technique is used. To segment the tumor region, multilevel thresholding is used. The density of the affected zone is determined by the number of cancerous pixels. S. Rajkumar et al[7], propose Deep Leamer's Neural Network algorithms are key ways in medical imaging for the prediction of the early symptoms of an illness using MRI pictures. Medical images in the position of a human brain tumor are obtained, and various functions based on the tumor image are obtained, such as energy, contrast, uniformity, dissimilarity, correlation and entropy. The results of the simulation suggest that the proposed Deep Learning Neural Network (DLNN) algorithm detects abnormalities more effectively. In the human brain, it is found in lesser grayscale and normal tissues. When compared to previous algorithms, the suggested DLNN method diagnoses human brain cancers in seconds.
Advances in Science and Technology Vol. 124
39
3. Existing Methods The existing methods are further categorized into the following: • Region growing methods • Watershed method • Thresholding method 3.1. Region Growing Methods: Image pixels constitute discontinuous areas in RG methods and are examined via nearby pixels, which is then blended with uniform features based on pre-defined similitude criteria. Due to the partial volume impact, the region increasing may fail to deliver higher precision. MRGM is preferred to counteract this impact. The region is also introduced as it grows through BA approaches. 3.2. Watershed Method: As MRI scans have higher proteinaceous fluid concentration, watershed methods are used to assess intensity of that image. Watershed approach leads to over-segmentation due to noise. Accurate segmentation results can be produced from watershed transform and statistical methods used together. Image foresting transform (IFT) watershed, Topological watershed and marker-based watershed are some of the algorithms. 3.3. Thresholding Method: The thresholding approach is a straightforward and effective method for segmenting the essential objects, however selecting an optimal threshold in low-contrast photos can be difficult. To pick threshold values based on image intensity, histogram analysis is utilized. Thresholding methods are divided into two types: local and global, which is useful for images with high intensity and contrast. The thresholding approach is typically employed as the first stage of segmentation, and several distinct regions within greyscale images are segmented. The Gaussian distribution approach can be used to calculate the best threshold value. It is helpful when the threshold value is not determined from the complete picture histogram or when a particular threshold value does not yield appropriate segmentation result. 3.4. Existing Deep Learning Methods: Deep convolutional neural network with sophisticated neural architecture with a combination of different hidden layers with changing weights at each node is one of the regularly utilized methods. The original image database is collected in the existing system, and the retrieved images are upgraded using preprocessing techniques and noise removal. The tumor region is segmented from the MRI image after pre-processing. The image's characteristics are extracted, and classifiers such as CNN are utilized to classify it. The Challenges with the existing Deep Learning System are 1. Existing approaches are tested and trained on limited and local datasets that do not adequately represent the main tumor classifications. 2. Another challenging procedure is optimizing and selecting the optimal features, which leads to erroneous classification of brain tumors. 3. There is a requirement to create a lightweight model which gives higher accuracy in less time. 4. Detecting a small amount of tumor remains challenging since it can also be mistaken for a normal region. 5. Some present approaches can only detect one tumor region and do not work well for additional regions. 4. Methodology The goal of this project is to create a strong CNN model that can classify brain tumors from MRI scan data. The proposed system as given in Fig 1 has two phases: a. Pre-processing b. Classification of Tumors using VGG 16 and ResNet50.
40
IoT, Cloud and Data Science
4.1. Convolutional Neural Network (CNN): Animals' ability to comprehend images is both interesting and basic. However, there are many hidden complexities during the process for a machine to analyze an image. What animals feel is the image being viewed by its eyes, which is then processed by the neurons and delivered to the brain for interpretation. CNN is a deep learning technique that aims to emulate the visual apparatus of animals by being inspired by its visual cortex. It represents a quantum leap in image comprehension, including image classification, segmentation, localization, detection, and so on. CNNs are built up of convolutional layers with weights and biases that can be learnt, just like neurons, that also has the fundamental components like activation functions and various pooling layers, along with fully connected layers. 4.2. VGG-16: It is a Convolutional Neural Network (CNN) architecture with numerous layers that stands for Visual Geometry Group. The network is distinguished by its simplicity, with only 33 convolutional layers stacked on top of each other, with the max-pooling layers handling increasing depth and volume size. The softmax layer is followed by two fully linked layers, each with 4096 nodes. 4.3. Residual Network (ResNet50): It has 152 layers and is 8 times deeper than VGG 19, although it has less computing complexity. Residual net ensembles had a lower error rate of 3.57 percent on the ImageNet test set than other models. The residual solves the challenge of training a truly deep architecture by incorporating identity skip connections, which allow the layers to duplicate their input to the next layer. Data Partitioning
Image Pre-processing,
Brain Image Dataset
Training Dataset
Testing Dataset
Glioma Tumor
Hyper Parameter Tuning
Train the Deep learning CNN, VGGand Resnet Model
Deep Learning Model
Meningioma tumor Pituitary tumor No tumor
Fig.1.Proposed Architecture
Advances in Science and Technology Vol. 124
41
5. Experimental Results 5.1. Dataset In any image classification task using deep learning, the first step is to prepare the dataset in a proper format, which will make it easier for us to interpret and analyze it. The MRI scan images of the brain present in our dataset and used in this project for brain tumor detection and classification, came from kaggle.com. The dataset consisted of greyscale MRI scans of the brain. The training dataset contains 3264 images, divided into four categories: 826 brain images with glioma tumors, 822 brain images with meningioma tumors, 827 brain images with pituitary tumors, and 395 brain images with no tumors. Once this is done, pre-processing of images and further exploratory data analysis can be done. The proposed method uses the concept of transfer learning with convolutional neural networks, to train the model to get accurate results. Optimization techniques will be used for feature extraction and hyper-parameter tuning to choose a set of optimal hyperparameters for the model. The project provides better accuracy for classification in less computational time. Neural networks models like VGG-16 (Visual Geometry Group) and ResNet50 are merged using ensemble methods to provide better predictions. 5.2. Image Pre-processing: Processing a picture is really challenging. It is critical to eliminate any extraneous components from any image before processing it. Because pictures include distinct variations of intensity, contrast, and size, pre-processing is used to generate smooth training. Converting an image to grayscale, removing noise, and reconstructing an image are all examples of pre-processing. After resizing the image to standard shape which were earlier in different sizes, normalization, and rescaling of image pixel values to the range facilitates the training phase of the model. 5.3. Feature extraction and Classification Using feature extraction, we can turn raw data into numerical features that computers can handle more efficiently while maintaining the information in the original data set, and this outperforms using machine learning on raw data. Feature extraction detects the most distinguishing properties in signals, making it easier for a machine learning or deep learning algorithm to ingest them. The feature extractor is used in the training process of convolutional neural networks without the need to manually build it. In CNN, feature extraction is handled via the convolution and pooling layers. 6. Performance Metrics Classification divides each item in a data collection into one of several specified classes or groupings. The performance of the model can be calculated from the following formulae:
Classification is a useful tool for distinguishing between normal and malignant brain pictures. After the model has been trained on the dataset and the training and validation accuracies have increased, the model is used to predict whether a given image indicates a glioma tumor, meningioma tumor, pituitary tumor, or no tumor at all.
42
IoT, Cloud and Data Science
Table 1. Performance Metrics of the model
CNN+ ResNet Model
Accuracy
Precision
Recall
F1Score
96.3
94
94
94
Fig. 2. Confusion Matrix
Fig. 3. Graph – Comparison of Training and Validation - Accuracy and Loss
Advances in Science and Technology Vol. 124
43
7. Conclusion In this paper, we have presented a method for automated classification of a brain tumor into 4 classes – glioma, meningioma, pituitary tumor and no tumor. Our neural network model yielded a training accuracy of 97.8% and validation accuracy of 94.4%, which confirms that there was no overfitting of the model during training. The model was successful in classifying the brain tumor classes with high performance in accuracy and precision. The efficiency of this model can be further improved by including more brain MRI scans with different weights and with various image augmentation techniques to increase the dataset size and to allow the model to perform as a more generalized and robust application for larger image databases. References [1]
Javaria Amin, Muhammad Sharif, Anandakumar Haldorai, Mussarat Yasmin, Ramesh Sundar Nayak - “Brain tumor detection and classification using machine learning: a comprehensive survey”, Complex & Intelligent Systems, October 2021
[2]
Emrah Irmak - “Multi-Classification of Brain Tumor MRI Images Using Deep Convolutional Neural Network with Fully Optimized Framework”, Iranian Journal of Science and Technology, Transactions of Electrical Engineering, 10 April 2021
[3]
G. Hemanth, M. Janardhan and L. Sujihelen, "Design and Implementing Brain Tumor Detection Using Machine Learning Approach," 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), 2019
[4]
R. D. Shirwaikar, K. Ramesh and A. Hiremath, "A survey on Brain Tumor Detection using Machine Learning,” International Conference on Forensics, Analytics, Big Data, Security (FABS), 2021
[5]
A. Miglani, H. Madan, S. Kumar and S. Kumar, "A Literature Review on Brain Tumor Detection and Segmentation,", 5th International Conference on Intelligent Computing and Control Systems (ICICCS), 2021
[6]
A. Sinha, A. R P, M. Suresh, N. Mohan R, A. D and A. G. Singerji, "Brain Tumor Detection Using Deep Learning," 2021 Seventh International conference on Bio Signals, Images, and Instrumentation (ICBSII), 2021.
[7]
S. Rajkumar, K. Karthick, N. Selvanathan, U. K. B. Saravanan, M. Murali and B. Dhiyanesh, "Brain Tumor Detection Using Deep Learning Neural Network for Medical Internet of Things Applications," 2021 6th International Conference on Communication and Electronics Systems (ICCES), 2021.
[8]
N. M. Dipu, S. A. Shohan and K. M. A Salam, "Brain Tumor Detection Using Various Deep Learning Algorithms,” 2021 International Conference on Science & Contemporary Technologies (ICSCT), 2021.
[9]
G. Hemanth, M. Janardhan and L. Sujihelen, "Design and Implementing Brain Tumor Detection Using Machine Learning Approach," 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), 2019
[10] G. Raut, A. Raut, J. Bhagade, J. Bhagade and S. Gavhane, "Deep Learning Approach for Brain Tumor [11] Detection and Segmentation," 20International Conference on Convergence to Digital World Quo Vadis (ICCDW), 2020 [12] S. Irsheidat and R. Duwairi, "Brain Tumor Detection Using Artificial Convolutional Neural Networks," 2020 11th International Conference on Information and Communication Systems (ICICS), 2020
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 44-52 doi:10.4028/p-4s4w34 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-30 Accepted: 2022-09-16 Online: 2023-02-27
Music Recommendation System Using Facial Emotions Sharath P.1,a, G. Senthil Kumar1,b*, Vishnu Boj K.S.1,c Department of Computational Intelligence, SRM Institute of Science and Technology, Kattankulathur, India, 603203
1
Corresponding author: [email protected] [email protected], [email protected]*, [email protected] *
a
Keywords: MLP Classifier, CaffeModel, OpenFace PyTorch Model, OpenCV, Face and Emotion Detection, Flask, PythonAnywhere.
Abstract: Emotions play an important role in human life. Extracting human emotions is important because it conveys nonverbal communication cues that play an important role in interpersonal relations. In recent years, facial emotion detection has received massive attention, and many businesses have already utilized this technology to get real-time analytics and feedback from customers to help their business grow. Currently, we have to manually find playlists according to our mood, and it's time-consuming and stressful. Therefore, this process is made automated and simple in this project by proposing a recommendation system for emotion recognition that is capable of detecting the users' emotions and suggesting playlists that can improve their mood. Implementation of the proposed recommender system is performed using Caffemodel to detect faces and the MLP Classifier to detect facial emotions based on the KDEF dataset. I. Introduction Emotions play a major role in each individual's life. Emotions are basically feelings, and they essentially control our reactions, ideas, and choices. An emotion is a mental state associated with the nervous system [10]. Human life would be dull and colorless without feelings like joy, excitement, love, sorrow, fear, and disappointment. Emotion adds spice and color to life. Communication involves both verbal and non-verbal methods, and sometimes non-verbal communication conveys more messages than verbal communication. Facial expressions are more explanatory in many situations where words fail, like a shock or a surprise [4] . According to recent research, nonverbal communication accounts for 70 to 93 percent of all communication [17], with facial expressions playing a significant role. Humans are good at reading facial expressions and emotions. Facial expressions are the oldest and most natural way of communicating emotions, moods, and feelings [2]. In recent years, detecting emotions by using various technologies has also improved, and it has many businesses uses as well as healthcare or psychological uses. Businesses use emotions to get real-time analytics and feedback from customers to make their business grow. In this paper, we concentrate on the healthcare or psychological uses of detecting emotions. According to recent research, music can have a great influence on human emotions and can even treat diseases [12]. It can lift your mood and reduce anxiety or stress. Therefore, we are proposing a music recommendation system that detects emotions from the facial expressions of users and recommends a playlist containing songs that can alter their mood. This project is divided into four sections. The first and second sections are about detecting faces and emotions in the uploaded image using Caffemodel and MLP Classifier. The third section is about deploying the model into a website using the Flask framework and styling the website using Bootstrap. The fourth section is all about setting up the JSON file containing songs and using it in the web app and also about the hosting of the created website on the PythonAnywhere cloud. In the proposed model, we have preferred Caffemodel over the Viola Jones Algorithm because it requires parameter tuning in some cases where it gives false positives in detecting faces in an image. Caffemodel uses the deep learning framework of Caffe and is very efficient in image classification and image segmentation [18]. An MLP Classifier is being utilized in the model for performing the task of emotion recognition. The MLP Classifier uses an underlying neural network to perform
Advances in Science and Technology Vol. 124
45
classification, and it is also suitable for our model as the inputs are divided into multiple classes. The proposed model will detect four major emotions: happy, angry, sad, and neutral. From the KDEF dataset, we will be extracting only images containing these emotions. Party songs will be recommended for happy emotions, calming songs for angry people, motivating songs for sad people, and a mixture of these songs for neutral emotions. Flask and Django are the main two web hosting frameworks used for deploying models into a website, and we will be using Flask as it supports the usage of multiple types of databases, is easier to use, has higher compatibility with the latest technologies, and is easier to develop and maintain. AWS Lambda, Google Cloud Functions, Google AI Platform, Azure Machine Learning Service, etc are some of the other methods of deploying models that are serverless compute and cloud platform frameworks [20]. Bootstrap will be used for making the website compatible with all screen sizes and also to style some of the components. Bootstrap makes styling easier and, therefore, reduces the time for development. We will be using a JSON file to store song links and won't be using any database because we require only a few KBs for storing our playlists. Finally, for hosting the website, we will be using PythonAnywhere Cloud, which provides a free service for a small-scale application like this. It’s easy to use, flexible, can deploy the code within minutes and contains many pre-installed libraries that we can use for our project. It also provides consoles which can be used to install other packages. Firebase, Heroku, Netlify, Elastic Beanstalk, Engine Yard, Azure App Service, etc. are some of the alternatives to hosting websites. 2. Literature Survey Ma Xioxi et al. [1] proposed a system that is able to detect facial emotions by training the model on the FERA 2015 dataset. The authors used several algorithms, such as SVM, Deep Boltzmann Machine, and Fusion Method, to train the model. Among these algorithms, the Fusion Method gave the maximum accuracy of 91%, while SVM gave the least accuracy of 85.7%. Shlok Gild et al. [2] proposed a system that uses a multilayered neural network to detect emotions and an EMP to create playlists. Gokul Krishnan et al. [3] used the Viola Jones algorithm for detecting faces and CNN2 for detecting emotions. The authors preferred CNN2 instead of VGG '16 as CNN2 requires 10 times fewer MFLOPs than VGG’16 and he achieved 92.99% accuracy. The Youtube API was used here to recommend songs. Balaji Balasubramanian et al. [4] proposed a paper which compares the efficiency of Support Vector Machines (SVM) and Convolutional Neural Networks (CNN) in detecting facial emotions. The authors also used five different datasets (CK+, JAFFE, FER 2013, AFEW, and EmotionNet) for this process. Priya Dharshini et al. [5] proposed a paper which compares the accuracy of SVM and CNN in detecting facial emotions. Imane Lasri et al. [6] determine facial emotions using CNN and detect faces using Haar Cascades. The authors used the FER 2013 dataset and achieved an accuracy rate of 70%. Priya et al. [7] proposed a system that uses AAM (Active Appearance Model) for facial feature extraction and SVM to predict facial emotions. The authors used Django to deploy the model on a website. Ameya Badve et al. [8] proposed a system that determines facial emotions using SVM. The authors used Haar Cascades for face detection and a training accuracy of 69% was achieved on the FER 2013 dataset. Spotify was used by the author to recommend songs for the detected emotion. Metilda Florence et al. [9] determine facial emotions using the Fisherface algorithm, and face detection was performed using HAAR and HOG algorithms. The authors used the CK+ and HELEN datasets for this music recommendation system and Emoplayer for recommending songs. The authors had a happy song database and a neutral song database and also collected feedback from the users to improve the recommendation system's accuracy. On the FERC-2013 and JAFFE datasets, Akriti Jaiswal et al [10] used CNN-based deep learning architecture for emotion detection.The authors achieved 70.14% accuracy for FERC-2013 and 98.65% for JAFFE. Deny John Samuvel et al [11] proposed a system that determines facial emotions using an SVM Classifier and the authors recommended music according to the emotions. Mikhail Rumiantcev et al. [12] proposed an emotion-driven recommendation system with respect to personalized preferences.
46
IoT, Cloud and Data Science
The authors collected feedback from the users to further improve the model using MuPsych and used Spotify to recommend songs. The authors also developed a mobile application to deploy the model. Madhuri Athavle et al. [13] determine facial emotions using CNN and detect faces using Haar Cascades. The authors used the FER 2013 dataset and also tested the models' accuracy using SVM and ELM as well. The accuracy of the model with CNN, SVM, and ELM was 71%, 66%, and 63%, respectively. Chidambaram et al. [14] proposed a system that uses the VGG16 CNN model to detect emotions based on the FER 2013 dataset. The authors used Spotify to recommend songs to the users. Vinay et al. [15] proposed a system that determines facial emotions using SVM, and the authors used React JS to develop the frontend and NodeJS to develop the backend. Some of the limitations identified by the survey are : 1. The Cascade Classifier/Viola Jones algorithm becomes less efficient in some cases and requires parameter tuning to remove false-positives. [8] 2.Using the FER 2013 dataset, the maximum accuracy obtained was 70%. [6]. [4] [8] 3.CNN2 performs better compared to VGG16 because VGG16 has more MFLOPs.[3] 4.Due to less availability of images in the dataset the accuracy was less for some emotions.[9] 5.Minimum image quality of 320p is required for the classifier to predict the emotion of the user accurately.[9] 3. Music Recommendation System Using Facial Emotions The existing system of music recommendations is based on the history of songs we have listened to without considering our current feelings or emotions [15]. Therefore, we have to manually find songs according to our emotions. This process is time consuming and stressful. This also demotivates people to prefer listening to music to alter their moods, like sadness or anger. We aim to make this process simple and automated through this project by proposing a recommendation system that is capable of detecting users' current emotions and suggesting playlists to them to alter their emotions like sadness and anger or enjoy their current emotions like happiness. Implementation of the proposed recommendation system is performed using Caffemodel to detect faces and the MLP Classifier to detect facial emotions. The model is then deployed into a website using Flask and hosted using PythonAnywhere. The playlists will be stored in a JSON file. This project was divided into four modules, and each module will be explained in the upcoming sub sections. Figure 1, given below, shows the proposed system architecture.
Figure 1 - Architecture diagram
Advances in Science and Technology Vol. 124
47
3.1 Dataset : The Karolinska Directed Emotional Faces dataset contains 4900 pictures of human facial expressions [19]. For our recommendation system, we only used pictures containing emotions like angry, happy, sad, and neutral. Figure 2 shows some of the sample images in the KDEF dataset.
Figure 2 - Dataset Sample Images
Figure 3 - Face detection using Caffemodel
3.2 Face Detection using CaffeModel and OpenCV: In this module, detection of faces from the uploaded image takes place and a rectangle will be drawn around the face using Opencv. This will be performed using Caffemodel and DNN module of Opencv. From the uploaded image, we have to first create a blob image using the DNN module of Opencv and set it as the input to the caffemodel. The model will then return us the coordinates of the face in an image as detections if the face is present. These coordinates can be used to draw a rectangle around the face indicating that the face has been detected. Figure 3 shows an example of face detection using Caffemodel. Caffemodel: Caffemodel file is a machine learning model created by caffe and it contains an image classification model which is trained using Caffe [18]. Caffe (Convolutional Architecture for Fast Embedding) is a deep learning framework that allows users to create image classification and segmentation models.It can detect faces from an image and returns the coordinates of the face as output which can be used for various purposes. The model used is res10_300x300_ssd_iter_140000_fp16.caffemodel. 3.3 Emotion Detection using MLP Classifier on KDEF dataset: From the detected face, extraction of features of the face with the help of the OpenFace Pytorch model will take place in this module first. With the help of a pre-trained MLP classifier model that we have built based on the KDEF dataset, the emotion will be predicted from the features extracted. For extracting facial features, we will have to again create a face blob and feed it into the OpenFace Pytorch model as input. This model will return facial descriptors or features as vectors, which can be fed into the MLP Classifier model to predict the emotion. MLP Classifier: Multi-layer Perceptron classifier (MLP) is a type of artificial neural network. It contains an input layer, a hidden layer and an output layer. It relies on an underlying Neural Network to perform the task of classification. MLP is a feedforward ANN and uses backpropagation for training the network. Figure 4 shows the internal architecture of MLP Classifier.
48
IoT, Cloud and Data Science
Figure 4 – Figure Architecture of MLP Classifier
Figure 5 - Features extracted by OpenFace It is a PyTorch model which uses openface.nn4.small2.v1.t7 to detect faces and extract features such as eyes, mouths, etc. from them. These extracted features can later be used to train the machine learning model. This model requires the input image to be cropped and aligned in the same way as the original OpenFace. Figure 5 shows an example of feature extraction by the OpenFace PyTorch Model. 3.4. Deploying the model into a website using the Flask Framework: In this module, deployment of the ML model into a website using Flask and styling of the frontend using Bootstrap takes place. Flask is a microweb framework written in Python that can be used for developing web applications. It does not require any tools or libraries and is free to use. Bootstrap is a free and open-source CSS framework that can be used to create responsive websites. It contains HTML, CSS, and JS-based templates which can be used for buttons, typography, navigation, and other components. 3.5: Setting up the JSON file for storing song links and hosting the website using PythonAnywhere: In this module, we will set up the JSON file and store the playlists containing song links for different emotions in it. Then we will read this in the python file and filter the songs according to the emotion predicted and display the filtered list of songs to the user on the web page. This JSON file will act as a database and will be stored in the PythonAnywhere cloud. JSON is an open-standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. JSON is mainly used for transmitting data in web applications containing servers. Next, the hosting of the final website on PythonAnywhere Cloud will take place. We have to manually add files to the PythonAnywhere cloud, and it will automatically host the website. It contains almost all the libraries required, and the missing ones can be installed using their bash console. It also provides the function of editing the uploaded code through their file editor. Python anywhere simplifies the deployment and execution of Python programs in the cloud.
Advances in Science and Technology Vol. 124
49
3.6 Stepwise Implementation: 1. All the features from faces present in the KDEF dataset are extracted using the OpenFace Pytorch model and stored in a pickle file. 2. This data is then split into train and test datasets and fed into the MLP Classifier model. 3. GridSearchCV is then used to do parameter tuning, and the final model is stored with the help of pickle. 4. Finally, a pipeline of CaffeModel, OpenFace Pytorch Model, and MLP Classifier Model is created, and this pipelined model can take in any image and identify the face and emotion from it. 5. The pipelined model is then deployed into a website using the Flask framework, and the frontend is styled using Bootstrap. 6. Songs are stored next in a JSON file and used in the python file and displayed on the webpage. Finally, we hosted the website on the PythonAnywhere Cloud.
Figure 6 - Website UI - Index Page 3.7 Algorithm: 1) Start. 2) Capture photo from webcam or through file upload option. 3) Create an image blob using Opencv and feed it into Caffemodel. 4) Check for detections. 5) If detections > 0: a. Find the detection with maximum confidence. b. If confidence > 0.5: i) Create an ROI using the coordinates of the face returned by Caffemodel. ii) Create a face blob and feed it into the OpenFace PyTorch model. iii) Pass the output into the MLP Classifier model to predict emotion. iv) If emotion == ‘angry’, do songs[“angry”]. v) Else if emotion == ‘happy’, do songs[“happy”]. vi) Else if emotion == ‘sad’, do songs[“sad”]. vii) Else, do songs[“neutral”]. viii) Display the filtered playlist from JSON file and the predicted emotion to the user. c. Else, display face not detected and hence no recommendations available. 6) Else, display face not detected and hence no recommendations available. 7) Stop.
50
IoT, Cloud and Data Science
4. Results and Analysis
Figure 7 - Classification Report
Figure 8 - Model Accuracy in detecting emotions
The above classification report (figure 7) and bar graph (figure 8) clearly show that the happy emotion was identified with the highest accuracy. The model has the least accuracy when it comes to the sad emotion and sometimes misjudges sad as some other emotion. These accuracies were recorded before parameter tuning, and after parameter tuning, an overall accuracy of 80.48% was recorded. Figures 9,10,11, and 12 given below show the playlist (output) recommended for the emotions neutral, happy, angry, and sad, respectively.
Figure 9 - Playlist for neutral emotion
Figure 10 - Playlist for happy emotion
Advances in Science and Technology Vol. 124
Figure 11 - Playlist for angry emotion
51
Figure 12 - Playlist for sad emotion
5. Conclusion and Future Scope The model successfully identified faces and emotions for the images uploaded with an accuracy of 80.48%. The deployed website was able to predict emotions and suggest songs from both uploaded photos and selfies captured on webcam. Emotions were identified with higher accuracy when the image was uploaded using the file upload option, whereas some emotions were misjudged when the image was captured from the live camera. The model was able to identify emotions even when the picture quality was only 144p. The accuracy of this model can be further improved by increasing the dataset with people of all age groups and with selfies. The JSON file containing playlists can also be improved by adding songs from more languages, and the model can also be trained with new emotions as well. References [1] Ma Xiaoxi, Lin Weisi, Huang Dongyan, Dong Minghui, Haizhou Li , "Facial Emotion Recognition", IEEE 2nd International Conference on Signal and Image Processing, 978-1-53860969-9/17/$31.00 ©2017 IEEE. [2] Shlok Gilda, Husain Zafar, Chintan Soni and Kshitija Waghurdekar, "Smart Music Player Integrating Facial Emotion Recognition and Music Mood Recommendation", IEEE WISPNET 2017 conference, 978-1-5090-4442-9/17/$31.00 c 2017 IEEE. [3] Gokul Krishnan K, Parthasarathy M, Sasidhar D, Venitha E, "EMOTION DETECTION AND MUSIC RECOMMENDATION SYSTEM USING MACHINE LEARNING", International Journal of Pure and Applied Mathematics. [4] Balaji Balasubramanian, Rajeshwar Nadar, Pranshu Diwan, Anuradha Bhatia, "Analysis of Facial Emotion Recognition",Proceedings of the Third International Conference on Trends in Electronics and Informatics (ICOEI 2019), IEEE Xplore Part Number: CFP19J32-ART; ISBN: 978-1-53869439-8. [5] P.Priya dharshini, S.Sowmya, J. Gayathri, "EMOTION BASED RECOMMENDATION SYSTEM FOR VARIOUS APPLICATIONS", IJARIIE-ISSN(O)-2395-4396,Vol-5 Issue-2 2019. [6] Imane Lasri, Anouar Riad Solh, Mourad El Belkacemi, "Facial Emotion Recognition of Students using Convolutional Neural Network", 978-1-7281-0003-6/19/$31.00 ©2019 IEEE.
52
IoT, Cloud and Data Science
[7] P.Priya, J.Monisha, V.Chandana, A.M.Balamurugan, "EMOTION BASED MUSIC RECOMMENDATION USING FACE RECOGNITION", International Journal of Advanced Research in Basic Engineering Sciences and Technology (IJARBEST), ISSN (ONLINE):2456-5717, Vol.5, Issue.7, July 2019. [8] Ameya Badve, Atharva Deshpande , Hitesh Kadu , Prof. Mrs. B. Mahalakshmi, "MUSIC RECOMMENDATION USING FACIAL EMOTION DETECTION AND CLASSIFICATION", JOURNAL OF CRITICAL REVIEWS, ISSN- 2394-5125, VOL 7, ISSUE 19, 2020. [9] S Metilda Florence and M Uma, "Emotional Detection and Music Recommendation System based on User Facial Expression",3rd International Conference on Advances in Mechanical Engineering (ICAME 2020),IOP Conf. Series: Materials Science and Engineering 912 (2020) 062007, doi:10.1088/1757-899X/912/6/062007. [10] Akriti Jaiswal, A. Krishnama Raju, Suman Deb, "Facial Emotion Detection Using Deep Learning", 2020 International Conference for Emerging Technology (INCET),978-1-7281-62218/20/$31.00 ©2020 IEEE. [11] Deny John Samuvel, B. Perumal, Muthukumaran Elangovan, "MUSIC RECOMMENDATION SYSTEM BASED ON FACIAL EMOTION RECOGNITION", 3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143,Edición Especial Special Issue Marzo 2020. [12] Mikhail Rumiantcev, Oleksiy Khriyenko, "Emotion Based Music Recommendation System", PROCEEDING OF THE 26TH CONFERENCE OF FRUCT ASSOCIATION, ISSN 2305-7254 . [13] Madhuri Athavle, Deepali Mudale, Upasana Shrivastav, Megha Gupta, "Music Recommendation Based on Face Emotion Recognition", Journal of Informatics Electrical and Electronics Engineering, 2021,Vol. 02, Iss. 02, S. No. 018, pp. 1-11,ISSN (Online): 2582-7006. [14] G.CHIDAMBARAM, A.DHANUSH RAM, G.KIRAN, P.SHIVESH KARTHIC, ABDUL KAIYUM, "MUSIC RECOMMENDATION SYSTEM USING EMOTION RECOGNITION", International Research Journal of Engineering and Technology (IRJET), Volume: 08 Issue: 07 | July 2021, e-ISSN: 2395-0056, p-ISSN: 2395-0072. [15] Vinay p, Raj Prabhu T, Bhargav Satish Kumar Y, Jayanth P, A. SUNEETHA, "Facial Expression Based Music Recommendation System", International Journal of Advanced Research in Computer and Communication Engineering, Vol. 10, Issue 6, June 2021, ISSN (Online) 2278-1021 , ISSN (Print) 2319-5940 ,DOI 10.17148/IJARCCE.2021.10682. [16] Lundqvist, D., Flykt, A., & Öhman, A. (1998). The Karolinska Directed Emotional Faces KDEF, CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet, ISBN 91-630-7164-9. [17] https://www.lifesize.com › Home › The Lifesize Blog. [18] https://fileinfo.com/extension/caffemodel [19] https://www.kdef.se [20] https://www.analyticsvidhya.com/blog/2021/02/ml-model-deployment-with-webhostingframeworks
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 53-59 doi:10.4028/p-2ffx83 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-31 Accepted: 2022-09-16 Online: 2023-02-27
Face Mask Detection Using OpenCV Abdul Najeeb1,a, Abhishek Sachan2,b, Ashutosh Tomer3,c, Ayushi Prakash4,d 1-4
ABES Engineering College, Ghaziabad, India
a
[email protected], [email protected], c [email protected], [email protected]
Keywords: Face Mask Detection, TensorFlow, Keras, OpenCV, Coronavirus, Data Set, Convolutional Neural Network.
Abstract. As a biosafety precaution, the World Health Organization (WHO) introduced the wearing of face masks after the COVID- 19 epidemic. This posed challenges to existing facial recognition systems, so this study was born. In this publication, we describe how to create a system that allows you to identify people from images, even when they wear a mask. The face detector in OpenCV is used in conjunction with Based on the Mobile NetV2 architecture, a classification model in this way, it is possible to determine whether the face is wearing a mask and where it is situated. To conduct face recognition, A Face Net model is used as a feature extractor and a multilayer feedforward perceptron is used for training facial recognition models using a collection of about 4000+ photographs. Of the images, 52.9 percent came with a face mask and 47.1 percent were without mask. The outcomes of the tests demonstration that determining whether or not someone is wearing a mask is 99.65% accurate. Face recognition accuracy for ten people wearing masks is 99.52 percent, whereas face recognition accuracy without masks is 99.96 percent. Introduction The COVID19 pandemic is the biggest life-changing event that has stunned the world since the year started, according to the year's calendar. COVID-19, which has impacted the health and lives of many people, has demanded severe procedures to be taken to prevent the spread of illness. Individuals do everything they can for their personal and hencethe from the most basic hygienic standards to medical treatments, society's safety is paramount; face masks are one of the private protective instruments. Face masks are worn when individuals leave their homes, and officials strictly enforce the wearing of face masks in groups and public areas. The procedure into two parts: There are many applications of object detection, and one of them is the detection of faces and masks. It can be used in a number of scenarios such as law enforcement, biometrics, and security. A number of detection systems take remained developed and are currently occupied crossways the world. Altogether of this study, on the other hand, yearns for efficiency, a far better, more efficient way of doing things. Many accurate detectors are needed since the world can no longer afford a growth of Corona instances. In this project, we'll create a mask detector that can tell the difference between people wearing masks and people who don't. We've planned a sight that usages SSD for expression ID and a neural network to recognize the presence of a cover throughout this project. The algorithmic software is used to implement pictures, movies, and animate video feeds. System Analysis Background History. Face recognition has become a hot topic of discussion throughout the world in recent decades. Moreover, with the expansion of technology and, as a result, the fast evolution of computers, extremely important advancements are made. As a result, governmental and private organizations employ facial recognition systems to detect and regulate access to persons in airports, schools, businesses, and other locations. Government agencies, on the other hand, have adopted several safety standards to restrict infections as the COVID-19 epidemic has unfolded. One of these is the requirement that face masks be worn in public places, since they must be demonstrated.
54
IoT, Cloud and Data Science
COVID-19 is mostly transferred by droplets created by infected people coughing or an inborn reaction. This spreads the virus to everyone who comes into direct touch (within one meter) with the coronavirus-infected individual. As a result, the virus spreads chop-chop across the areas. With the plaudits for the countrywide lockdowns, tracing and managing the infection has become much more difficult. Face masks are an excellent approach to stop the illness from spreading. The most efficient technique to prevent the virus from spreading has been proven to be wearing face masks. Governments all throughout the globe have strict laws that require everyone to wear masks when they leave the house. However, some people may not use masks, making it impossible to identify whether or not everyone is wearing one. Computer vision will be useful in such instances. There are no affordable mask notice programmers that can tell if someone is hiding behind a mask. This will boost the demand for a cost-effective method for placing face masks on people for transit, safeguarding the safety of highly inhabited areas, residential neighborhoods, large-scale factories, and alternative companies This plan uses machine learning organization by means of OpenCV and TensorFlow to identify facemasks listed humans. Literature Review Table 1.
Existing System. The COVID19 pandemic is the most immensely colossal life-transmuting event that has stunned the world since the year commenced, according to the year’s calendar. COVID-19, which has impacted the health and lives of many people, has injunctively authorized astringent procedures to be taken to avert the spread of illness. Individuals do everything they can for their personal and hence the from the most rudimental hygienic standards to medical treatments, society's safety is paramount; face masks are one of the private protective instruments. Face masks are worn when individuals leave their homes, and officials rigorously enforce the wearing of face masks in groups and public areas. Proposed Methodology. To surmount the drawbacks of the subsisting system, the proposed system has been evolved. This projects aim is to monitor that people are following the rudimentary safety principles. This is done by developing a face mask detector system. Software Description Code Editor • PyCharm is the most widely used IDE for Python coding. This chapter provides an overview of PyCharm and describes its functionality.
Advances in Science and Technology Vol. 124
55
• PyCharm provides its customers and developers with some of the top features in the following areas: Inspection and completion of the code. • Debugging on a higher level. • Web programming and frameworks such as Django and Flask are supported. Features • Compilation Of Code: PyCharm supports electric sander code completion for both in-built and external packages. • Git Mental Image in Editor: Queries are standard for a developer when they commit to writing in Python. PyCharm contains blue portions that may define the difference between the last commit and hence the current ne, thus you'll be able to verify the last commit easily. • Editor Code Coverage: Outside of PyCharm Editor, you may number .py files and indicate them as codecoverage information elsewhere in the project tree, in the overview section, and so on. • Management of Packages: All of the items are exhibited in their original packaging with accurate visual representations. This provides a list of installed packages, as well as the option to search for and install new packages. Development Tools and Technologies Python. Python is a high-level, interpreted, object-oriented programming language with dynamic semantics, and its built-in data structures, as well as its active typing and active binding, contribute to its admiration for Fast Application Growth. Modularity and reusability are encouraged by Python by supporting modules and packages. The Python interpreterand the wide-ranging typical library are free of charge and can be freely distributed on all main platforms. Guido van Rossum developed Python after reading the scripts of the BBC comedy series "Monty Python's Flying Circus", which was broadcast during the 1970s. OpenCV. OpenCV is a text file library written in ASCII that is largely used for laptop vision applications. This library includes gesture tracking, face recognition, object detection, separation and recognition, and a variety of other functions and algorithms. Using this toolkit, imageries and actualtime video streams are often altered to suit a variety of purposes. TensorFlow. It's an ASCII text file machine learning context for building and training neural networks. The situation is a collection of apparatuses, frameworks, and communal resources that make creating in addition preparing CC hopped-up apps simple. Google often develops and maintains this, until it was decommissioned in 2015. Requirements Python-based computer vision and deep learning packages will be utilized for the project's development and testing. Tools like boa Python, and libraries like NumPy, OpenCV, Kera's/TensorFlow are going to be used for this method. coaching is going to be conducted from the Dataset. Hardware • Operating System: Window 7, 32 bits. • Hardware: 4GB-RAM, Web Cam. • Programming Language: Python. • Computer Vision Library: OpenCV. Skin Detection: Neural Network • Stochastic Backpropagation. • Training patterns pre-whitened. • Learning Rate, h, Decreased with each training approach. • Train on equal number of Skin and Non-Skin Pixels.
56
IoT, Cloud and Data Science
Colour Space Options and Network Topologies • • • •
Choose the number of hidden units you want. Color may be expressed in a variety of color spaces. RGB, XYD, and HSV are three different color schemes. RGB had the least number of false positives.
Fig 1. Network Output Experiment Analysis Data Set. We utilized a dataset with over 4500+ photos, with over 2200 square meters of shrouded faces and over 2300 square meters of unmasked faces. All of the photos square measure genuine images retrieved from the Bing Search API, Kaggle datasets, and the GitHub dataset. The proportion of images in each of the three sources is equal. The images depict a diverse range of ethnicities, including Asians, Caucasians, and others. The ratio of cloaked to unmasked faces ensures that the dataset is evenly distributed. In our method, we've designated eighty-five percent of the data set as coaching data, and the remaining 15 August 1945 as testing data, resulting in a split magnitude relationship of zero. 8:0.2 of the train is used to check the set. Two tenth of the coaching information was used as a collection of validation data in all, 64% of the dataset is utilized for coaching, 16% for validation, and 20% for testing.
Fig 2. Face Mask Dataset
Fig 3. Without Face Mask dataset
Advances in Science and Technology Vol. 124
57
Fig 4. Flow Process Diagram Architecture The initial police investigation has begun. In the second section, we'll look at the presence or absence of a mask on a face due to the presence of numerous faces in a single photo or video stream. To observe the face, we utilized the OpenCV library. The current version of OpenCV has a Deep Neural Network (DNN) module for face recognition that includes a pre-trained convolutional neural network (CNN). The new algorithm outperforms earlier models when it comes to face detection. When a new check image is provided, it is first turned into a BLOBS (Binary giant Object refers to a collection of connected pixels in a highly binary image) before being passed to the pretrained model, which produces the number of recognized faces. By measuring the bounding box around the face and transmitting it to the second component of the model, we can check if it has a mask.
Load face mask dataset
Face Identity
Train face mask dataset classifier with Kera's/TensorFlow
Verification
Comparison and Identification
Load face mask classifier from disc
Defect faces in image/vsdeo steam
Extract each face ROI
Feature Extraction Face detection and cropping
Show results
Fig. 5 Data Flow Diagram
Apply face mask classifier so each ROI to detection “mask” or “no mask”
Feature Extraction Face detection and cropping
Test Image or Input Query
Fig 6. Block Diagram
The developed device will keep track of live video streams but not keep a record of them. This footage cannot be looped, played, or paused, unlike CCTV camera footage, which the admin may rewind, play, or pause. People are motivated to strive to break the rules If the suggested system is implemented strictly, top-level management of the organization is typically notified through letter when somebody is identified without a mask. To maintain track of who entered without a mask, the suggested system usually combines databases from other organizations. A screenshot of the person's face may also be linked in order to serve as an indication with other features.
58
IoT, Cloud and Data Science
Real Time Testing
Fig 7. Face Detect
Fig 9. Face Mask Detect
Fig 8. Face Not Detect
Fig 10. Face Mask Not Detect
Conclusion With COVID cases are on the rise instances throughout the world, A system for swapping employees to examine masks on people's faces is critical. This method fulfils that requirement. This strategy may be employed in public venues such as train stations and shopping malls. It will be useful in companies and large organizations with a large number of employees. This method will be beneficial since it is simple to obtain and save information about the employees working in this company, and it is really simple to identify those who aren't wearing masks, and a message will be issued to each individual to request Precautions not wearing masks. Face masks have lately been mandatory in more than fifty nations throughout the world. In public places such as supermarkets, public transportation, workplaces, and businesses, people must conceal their faces. Software is frequently used by retailers to the numerical value of individuals that enter their businesses. They could also want to keep track of impressions on advertising screens and digital displays Our Face Mask Detection technology will be updated and made available as open-source software. Our programmed may be used to detect persons without amask using any existing USB, IP, or CCTV cameras. This live video stream for detection may be integrated into online and desktop apps so that the operator can view notification messages. If someone isn't wearing a mask, software operators can also receive an image. Additionally, if someone arrives the area short of wearing a mask, an alarm system can be placed to produce a beep. This programmed, which may be connected to the entering gates, can only be used by people wearing face masks.
Advances in Science and Technology Vol. 124
59
References [1] M. S. Ejaz, M. R. Islam, M. Sifat Ullah and A. Sarker, "Implementation of Principal Component Analysis on Masked and Non-masked Face Recognition", 2019 1st International Conference on Advances in Science Engineering and Robotics Technology (ICASERT), 2019. [2] BOSHENG QIN and DONGXIAO LI, Identifying Facemask-wearing Condition Using Image Super-Resolution with Classification Network to Prevent COVID- 19, May 2020. [3] E. Learned-Miller, G.B. Huang, A. Roy Chowdhury, H. Li and G. Hua, "Labeled faces in the wild: a survey", Advances in Face Detection and Facial Image Analysis, 2016. [4] Kaihan Lin, Huimin Zhao, Jujian Ly (&) Jin Zhan, Xiaoyong Liu, Rongjun Chen, Canyao Li, Zhihui Huang et al., "Face Detection and Segmentation with Generalized Intersection over Union Based on Mask R- CNN", Advances in Brain Inspired Cognitive System, 2020. [5] A. Nieto-Rodríguez, M. Mucientes and V.M. Brea, System for medical mask detection in the operating room through facial attributes Pattern Recogn. Image Anal. Cham, 2015. [6] S. A. Hussain and A.S.A.A. Baluchi, A real time face emotion classification and recognition using deep learning model J. Phys.: Conf. Ser, 2020. [7] M. Jogin, Mohana, M.S. Madhulika, G.D. Divya, R.K. Meghana and S. Apoorva, "Feature Extraction using Convolution Neural Networks (CNN) and Deep learning", 2018 3rd IEEE International Conference on Recent Trends in Electronics Information Communication Technology (RTEICT), May 2018. [8] Kanakaraja P, Sourabh Upadhyay, Jahnavi K, Srinithya N, Harshini V. "Gate Access Control using Face Mask Detection and Temperature", 2022 International Conference on Electronics and Renewable Systems (ICEARS), 2022. [9] G Lakshmi Durga, Haritha Potluri, Amuktha Vinnakota, Naga Pavan Prativada, KalyanChakravarti Yelavarti. "Face Mask Detection using MobileNetV2", 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS), 2022. [10] Harish Adusumalli, D. Kalyani, R. Krishna Sri, M. Pratap Teja, P V R D Prasad Rao. "Face Mask Detection Using OpenCV", 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 2021. [11] G Nalini Priya, M. Shobana, C. Siva, B Kanisha, J K Monica, V Siva Vadivu Ragavi. "Dynamic Face Mask Detection Using Machine Learning", 2022 International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN), 2022. [12] Mohamed Loey, Gunasekaran Manogaran, Mohamed Hamed N. Taha, Nour Eldeen M. Khalifa. "A Hybrid Deep Transfer Learning Model with Machine Learning Methods for Face Mask Detection in the Era of the COVID19 Pandemic", Measurement, 2020. [13] Jonathan S. Talahua, Jorge Buele, P. Calvopiña, José Varela-Aldás. "Facial Recognition System for People with and without Face Mask in Times of the COVID-19 Pandemic”, Sustainability, 2021. [14] Sohaib Asif, Yi Wenhui, Yi Tao, Si Jinhai, Kamran Amjad. "Real Time Face Mask Detection System using Transfer Learning with Machine Learning Method in the Era of Covid-19 Pandemic", 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), 2021. [15] Gagandeep Kaur, Ritesh Sinha, Puneet Kumar Tiwari, Srijan Kumar Yadav, Prabhash Pandey, Rohit Raj, Anshu Vashisht, Manik Rakhra. "Face mask recognition system using CNN model", Neuroscience Informatics, 2022.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 60-71 doi:10.4028/p-23rx59 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-31 Accepted: 2022-09-16 Online: 2023-02-27
Crack Detection of Pharmaceutical Vials Using Agglomerative Clustering Technique Vishwanatha C.R.1,a*, V. Asha2,b* Department of MCA, New Horizon College of Engineering, Bengaluru 560103, Research Scholar at Visvesvaraya Technological University, Belagavi, Karnataka, INDIA
1
Department of MCA, New Horizon College of Engineering, Bengaluru 560103, Karnataka, INDIA
2
[email protected], [email protected]
a
*Corresponding author Keywords: Pharmaceutical Vials, cracks, Agglomerative Clustering, machine learning, quality
Abstract. Pharmaceutical industries remain very profitable but defects in medicine vials are causing losses and adding extra overhead in quality management. In order to minimize these losses and overheads, companies need to find new ways of doing quality management for every vial produced. This paper presents a method for finding cracks on the vials using Agglomerative Clustering Technique. The technique successfully detects all types of cracks on the vials. The algorithm has achieved 100% accuracy in detection of cracks on the Pharmaceutical Vials and can have potential application in pharmaceutical industries in quality control. Introduction Defect detection is a part of quality control in various industries [1-9]. It is difficult to control and track the quality of Pharmaceutical Vials during the production process and that is why there has been an increase in cases of contamination of the medicine fluids [1]. This contamination of the medicine fluid brings about many difficulties to the people who consume them. The cracks on the medicine vials are a serious and difficult problem. This directly affects people health and reputation of the company which produces it. This issue has led to a significant loss of revenue for many companies involved in this industry. Hence, the quality monitoring and control procedure needs to be developed in order to prevent any potential problems caused by the cracks on the vials. Fortunately, researchers have found many ways to prevent this problem from happening. They have done extensive research and have come out with various methods and technique. The method proposed by Li Fu et al. [10] includes techniques such as Median Filtering, Threshold and Canny Edge Detection. It has various steps like image acquisition, preprocessing, flaw detection and locating. In this technique, the rate of defect detection is 100%, 91.6%, and 94.4% respectively for the types of samples selected which includes vials with crack, missing edges in vials and dirty vials. In the technique proposed by Jaina George et al. [11] has the idea of using Fuzzy C Means Clustering for detecting glass bottle flaws. It has unsupervised model building and data analysis. Various filters have been used here for noise removal. K. P. Sinaga et al. [12] has proposed a new approach. In this framework, k-means clustering technique is adopted. Extra clusters will be discarded by the algorithm. Then according to the structure of data, automatically optimal number of clusters are found. It is an unsupervised technique. Huanjun Liu et al. [13] has come up with an intelligent inspection system for the glass bottles. The technique uses machine vision. Watershed transform method is used along with feature extraction. A classifier, fuzzy SVM ensemble is also used. The conducted tests show that the proposed method can reach rate of accuracy above 97.5%. Amin Noroozi et al. [14] has proposed a method where for any given arbitrary crack depth profile will be estimated using ACFM signals. For this an inversion approach is employed. Affine transformation is used by them to map the ACFM signals with respective crack depth. The underlying mapping is found efficiently a general fuzzy alignment algorithm is used. Then for various profiles of crack depth they used two algorithms GFAA and EFAA with simulated signals of ACFM. The experimental data is evaluated for examining the performance of the
Advances in Science and Technology Vol. 124
61
algorithm in practical scenario. The results show that in the training stage itself the proposed methods efficiently eliminates irrelevant data effect. In presence of noise the technique exhibits more stability compared to neural networks. Best results are obtained when EFAA is used [15]. Z. Yang and J. Bai [16] has proposed flaw detection technique for vial mouth. It uses HALCON software based on machine vision. In this technique, the vial mouth noise is taken out using a filtering mechanism. In the next stage, threshold segmentation separated the target and background. Then edge detection method extracts the edges and finally detection of flaws on the vial mouth is carried out. Experimental results conducted show that the technique is efficient and rapid in identifying flaws on the vial mouth. The technique also has the advantages of stability along with high precision. The technique proposed by Xianen Zhou et al. [20] uses Saliency Detection and Template Matching for the flaws present at the bottom of the glass bottle. The method also employs multiscale mean filtering to initially filter the image acquired. But the method is having a lower precision level. Xiaoyu Liang et al. [21] has proposed a method for the flaw detection on the surface of the tube-type bottle. The technique uses machine learning to detect the flaws. Location Segmentation followed by Contour Extraction is done to detect flaws on the preprocessed image. Wittaya Koodtalang et al. [22] has proposed that Hough Circle Transform can be used along with Deep Convolution Neural Network (CNN) in order to detect bottom region flaws of the bottle. The methodology is based on Deep Learning. The method uses median and high pass filtering for image processing. In the method, the computation time is on little higher side due to the absence of GPU. Current technologies used for detecting these cracks have their own limitations. For example, they are slow, expensive and require too much maintenance as well as calibration. The problem with defective or flawed medicine vials has been prevalent for a long time. The reasons could be attributed to the strict regulation of the government, unprofessional manufacturers or just plain human error. Either way, the problems caused by these is a serious matter of concern, even though deployment of good supervision is in place by pharma industries. Since the supervision is done on a collective number of vials or selective units but not on every production unit, the flaws on the vials are retained. So, the companies require a better solution to supervise every production unit. This paper deals with the detection of all sorts of cracks on pharmaceutical vials using image clustering techniques for quality management and analysis in pharmaceutical industries using the power of machine learning. This method can be successfully employed for defect inspection of vials using the model proposed [19]. Problem Definition There are different types of cracks. These cracks may arise on the various areas of vials like surface, top, base or bottom and neck region. The best way to stop the problem of cracking vials is to identify the flaws in them and take necessary measures before they start affecting the quality, this can be done using an AI machine learning algorithm that identifies any cracks within the vial and alerts production managers. The objective of this paper is to detect cracks on all the regions of the vials which can be caused due to various factors like assembly related, environment, manufacturing procedure, components packaging, etc. This paper deals with a machine learning technique to determine the cracks using clustering techniques. We will also cover how machine learning in conjunction with k-means clustering can be used to find cracks within these containers as well as cracks on other objects manufactured from glass or any transparent material. Methodology To detect cracks in vials, the agglomerative clustering is used. This approach can be applied to other problems related to quality management and analysis. The system has been trained using the K-means clustering algorithm with data from 115 number of images and the model is trained using 16 images with different types of flaws.
62
IoT, Cloud and Data Science
The process is done using two steps: in the first step, for each image the four corners are detected and in the second step, the similarity between all images is calculated by comparing their corners. Finally, a decision tree model is used to create a classification model based on these calculations. The cracks in the images were classified into two categories: cracks defects with distinct borders and airline gap defects without sharp borders, based on these categorizations the system had an accuracy rating of 100 percent for detecting cracks on the vials with large visible borders and 100 percent accuracy for air gaps in the vials. The main objective of this paper is to suggest a viable solution for identification of cracks on the vials for pharmaceutical industry for applications in quality supervision and analysis, we propose a machine learning agglomerative clustering approach for identification of cracks in medicine vial’s glass which can provide alerts about cracks before any serious damage occurs, with great precision. Algorithm Description The agglomerative clustering algorithm is often used for image classification, such as detecting and classifying objects in images. The main parameter of the algorithm that control the number of clusters can be adjusted by taking into consideration the number of features used in the dataset and other parameters that control the behaviour of a given clustering method. The agglomerative clustering technique is a hierarchy based on multiple levels (clusters) which are created by finding the k-th nearest neighbour to each element in descending order. This method generates a set of (k-1) trees starting with a single leaf node, where each tree has as many levels as there are clusters found at its root. Each level is represented by an individual cluster which contains all elements within its radius. Agglomerative clustering is a process that is used to group similar pieces of data or items together. This can be used when we want to group the different types of groups of cracks found in our supply chain in order to see how many are related, or how many are independent. When we want to use this model, there are four steps that need to be accomplished: Step 1: Choose a similarity measure. Step 2: Create clusters by grouping pieces of data with the same similarity measure. Step 3: Make sure the clusters have been created with no overlaps between them. Step 4: Name and save clusters based on their content and purpose. Proposed System Clusters are formed by combining similar objects, and they can be classified as either static or dynamic. Static clusters are groups of vials that have the same defect categories and dynamic clusters are groups of vials that have the same defect types but different defect categories.
Fig. 1 System Architecture
Advances in Science and Technology Vol. 124
63
The mathematical model for agglomerative clustering is used to detect defective parts in vials with cracks caused by quality management and analysis. Machines use this algorithm to categorize the cracks in each vial which increases the efficiency of sorting defective from non-defective parts. Clusters are the group of similar looking elements. There exist certain methodologies that can be used to find similarity between such clusters. The following way we can find similarity between clusters: • Min • Max • Group Average Consider the two clusters C1 and C2 and Pi and Pj are points such that P1 is point in C1 and P2 is point on C2 Then the elements required to calculate the similarity can be represented as shown below [17]: MIN = Sim (C1, C2) = Min Sim (Pi, Pj) such that Pi ∈ C1 & Pj ∈ C2
Min of C1 C2 [17] MAX = Sim (C1, C2) = Max Sim (Pi, Pj) such that Pi ∈ C1 & Pj ∈ C2
Max of C1 C2 [17] Group Average= Sim (C1, C2) = ∑ Sim (Pi, Pj) / |C1|*|C2|, where, Pi ∈ C1 & Pj ∈ C2
Group Average of C1 and C2 [17] The agglomerative clustering machine learning k clustering mathematical representation provides perfect results for this type of data. It helps to utilize machine learning and its math-based representation which provides more accurate results instead of using traditional filters like median or mean which may not be as accurate [18].
64
IoT, Cloud and Data Science
We can use clustering algorithms to create the image data of the vial, the image data of the vial is taken and then converted RGB to GRAY SCALE based on the filters that are inserted into the machine learning algorithm, then the final image clustered with the edges highlighted can be used to identify the cracks on the vail. Results and Discussion The study uses the noisy image mostly caused by the dust particles in the vial, an image which has a lot of unwanted noises and contains some missing information. It was difficult to see the image due to the noise and missing information. The difference between the noised image and the denoised image is compared in the below figure.
Fig. 2.1 Noise image and denoised image We have used a real image dataset of vail with defective properties to make the highlights of the clusters in the image using clustering for k=4 to achieve optimum results, the main idea is to take a random sample from the whole dataset, apply agglomerative clustering and then perform k-means clustering with the same set of data. Finally, to compare each result by measuring how similar each cluster is with respect to the ground truth cluster labels as shown in the below image.
Fig. 2.2 Original dataset filtered and segmented with k=4
Fig. 2.3 Clusters edges detected and identified
Advances in Science and Technology Vol. 124
65
The image processing machine learning tool is used to identify cracks in industrial parts. This can be done by plotting all the points from the part around clusters of similar points.
Fig. 2.4 Original image compared to the final result Machine learning algorithms like agglomerative clustering has power to identify levels of cracks in images that humans couldn’t notice otherwise. It enhances the image by visually highlighting damage to make it more visible for people to make better decisions about it, the power of such algorithm was effectively used in our system for quality analysis and identification of minute damage which can be left unnoticed with the human vision. The below figure shows a vial image with a mouth crack. The methodology detects this defect also. The outputs of the defect detection is as shown below.
Fig. 3.1 Original vial image with a mouth crack For the above image also we have used k=4 as the segmentation value to achieve optimum results. Then we have applied agglomerative clustering along with k-means clustering. The final result where mouth crack is detected is as shown below.
Fig. 3.2 Image filtered and segmented with k=4
66
IoT, Cloud and Data Science
Fig. 3.3 Final result where mouth crack is detected The same methodology can also detect a hairline crack on the vial. Even though hairline cracks rarely happen during manufacturing, the algorithm can detect it successfully. The initial image along with its final result is as shown below.
Fig 4.1 Original vial image with a hairline crack Segment the image with k=4 and then compare the obtained result to measure how similar each cluster is with respect to the other cluster as shown in the below image.
Fig. 4.2 Image filtered and segmented with k=4
Fig. 4.3 Final result where hairline crack is detected
Advances in Science and Technology Vol. 124
67
The below image is having a crack at the neck region. The initial image along with its output is as shown below. The methodology is successfully detecting the neck region defect.
Fig 5.1 Original vial image with a neck crack
Fig. 5.2 Image filtered and segmented with k=4
Fig. 5.3 Final result where neck crack is detected The following figures contains an image with a crack on the surface of the vial. This type of defect is most common during production. The agglomerative clustering algorithm could able to detect it. The results are as shown below.
Fig 6.1 Original vial image with a surface crack
68
IoT, Cloud and Data Science
Fig. 6.2 Image filtered and segmented with k=4
Fig. 6.3 Final result where surface crack is detected The defect identification problem itself is a challenging one with fusions and breakages, especially when the weldments are heavily deformed. Machine learning algorithms can help identify where these cracks are at a higher rate of accuracy than human observation, which helps in reducing human error and improving quality control.
Fig 7.1 Original vial image with a surface crack
Fig. 7.2 Image filtered and segmented with k=4
Advances in Science and Technology Vol. 124
69
Fig. 7.3 Final result where surface crack is detected The above image highlights the cracks by drawing an outline around the different clusters. The lines drawn around each individual cluster represent a density of points. The more density, the more cracks are present in that cluster. Conclusions The approach was successful in designing a machine-learning technique that is capable of detecting the edges of the cracks in the vial by utilizing agglomerative clustering. This paper presents a technique for the automated detection of cracks on the vial using agglomerative clustering, Agglomerative clustering is a machine learning algorithm. It is used to generate groups of similar values that can be used to identify edges in a clustered data set. Since the inception of data science and machine learning, there have been many advancements in clustering algorithms which are used in many fields. One of the most popular clustering algorithms is the agglomerative clustering algorithm, which has been widely implemented in various fields. Our paper concludes that it performs well with the given datasets and so it can be used for real time implementation of defect inspection in pharmaceutical industries. The research concluded with success, the report is completed by concluding that this approach of clustering can be used as a reliable and effective means for identifying the edges of cracks on the vials. The method used 115 number of images and the model is trained using 16 images with different types of flaws. The method has achieved 100 percent accuracy in detecting all types of cracks on the vial. It could be used for both detection and rejection purposes in order to achieve quality control in the pharmaceutical industries, but future work will have to be done on the performance limitations of this approach and on how agglomerative clustering can be used for identifying other types of defects on the vials, such as wrinkles, bubbles, black spots, scratches and so on. References [1] James A. Melchore, ‘Sound Practices for Consistent Human Visual Inspection’ AAPS PharmSciTech. Mar; 12(1) (2011), 215–221. [2] Asha, V., Nagabhushan, P., and Bhajantri, N.U., Unsupervised Detection of Texture Defects using Texture-Periodicity and Universal Quality Index. In: Proceedings of the 5th Indian International conference on Artificial Intelligence (IICAI-2011), Tumkur, India, 14-16 December 2011, pp. 206-217. [3] Asha, V., Bhajantri, N.U., and Nagabhushan, P., Automatic Detection of Texture Defects using Texture-Periodicity and Chi-square Histogram Distance. In: Proceedings of the 5th Indian International conference on Artificial Intelligence (IICAI-2011), Tumkur, India, 14-16 December 2011, pp. 91-104.
70
IoT, Cloud and Data Science
[4] V. Asha, N.U. Bhajantri and P. Nagabhushan. Automatic Detection of Defects on Periodically Patterned Textures, Journal of Intelligent Systems, vol. 20 (3), (2011), pp. 279-303. [5] V. Asha, N.U. Bhajantri and P. Nagabhushan. GLCM-based chi-square histogram distance for automatic detection of defects on patterned textures. International Journal of Computational Vision and Robotics (IJCVR), vol. 2 (4), (2011), pp. 302-313. [6] Asha, V., Bhajantri, N.U., and Nagabhushan, P., Automatic Detection of Texture Defects using Texture-Periodicity and Gabor Wavelets, in: K. R. Venugopal and L. M. Patnaik (Eds.): International Conference on Information Processing 2011, Communication in Computer and Information Science (CCIS) 157, pp. 548–553, Springer–Verlag, Berlin Heidelberg, 2011. [7] V. Asha, N.U. Bhajantri and P. Nagabhushan. Similarity measures for automatic defect detection on patterned textures. International Journal of Information and Communication Technology, vol. 4 (2/3/4), (2012), pp. 118-131. [8] V. Asha, N.U. Bhajantri and P. Nagabhushan. Automatic Detection of Texture-defects using Texture-periodicity and Jensen-Shannon Divergence. Journal of Information Processing Systems, vol. 8 (2), (2012) pp. 359-374. [9] V. Asha. Texture Defect Detection using Human Vision Perception based Contrast. International Journal of Tomography and Simulation (IJTS), vol 32, Issue 3, (2019), pp 86-97. [10] Li Fu, Shuai Zhang, Yu Gong, Quanjun Huang, “Medicine Glass Bottle Defect Detection Based on Machine Vision”, IEEE, pp. 5681-5685 (2019). [11] Jaina George, S.Janardhana, Dr.J.Jaya, K.J.Sabareesaan, “Automatic Defect Detection Inspectacles And Glass Bottles Based On Fuzzy C Means Clustering”,International Conference on Current Trends in Engineering and Technology, ICCTET’13, pp.8-12 (2013). [12] K. P. Sinaga and M. Yang, "Unsupervised K-Means Clustering Algorithm," IEEE Access, vol. 8, pp. 80716-80727, (2020). [13] Huanjun Liu, Yaonan Wang, Feng Duan, “Glass Bottle Inspector Based on Machine Vision”, International Journal of Computer, Electrical, Automation, Control and Information Engineering Vol: 2, No: 8, pp. 2682-2687, (2008). [14] Noroozi, R. P. R. Hasanzadeh and M. Ravan, "A Fuzzy Learning Approach for Identification of Arbitrary Crack Profiles Using ACFM Technique," in IEEE Transactions on Magnetics, vol. 49, no. 9, pp. 5016-5027, Sept. (2013). [15] Heena Gupta, B., Asha V, Impact of Encoding of High Cardinality Categorical Data to Solve Prediction problems, Journal of Computational and Theoretical Nanoscience, vol. 17, No. 9-10, pp. 4197-4201 Sep/Oct (2020), (5) ISSN: 1546-1955 (Print): EISSN: 1546-1963 (Online), Published on Sep/Oct 2020. [16] Z. Yang and J. Bai, "Vial mouth defect detection based on machine vision," 2015 IEEE International Conference on Information and Automation, 2015, pp. 2638-2642, doi: 10.1109/ICInfA.2015.7279730. [17] Chaitanya Reddy Patlolla. “Understanding the concept of Hierarchical clustering Technique” Medium. Published December 10, 2018. Accessed April 26, 2022. [18] Asha, V., Undithering using Linear Filtering and Non-linear Diffusion Techniques, International Journal of Artificial Intelligence (IJAI), vol. 2 (S09), 66-76, 2009. ISSN: 0974-0635. [19] C.R. Vishwanatha and V. Asha, “Prototype Model for Defect Inspection of Vials”, International Journal of Psychosocial Rehabilitation, Vol. 24, Issue 05, pp. 6981-6986, 2020.
Advances in Science and Technology Vol. 124
71
[20] Xianen Zhou, Yaonan Wang, Changyan Xiao, Qing Zhu, Xiao Lu, Hui Zhang, Ji Ge and Huihuang Zhao, “Automated Visual Inspection of Glass Bottle Bottom With Saliency Detection and Template Matching”, IEEE, pp. 1-15 (2019). [21] Xiaoyu Liang, Liangyan Dong, Youyu Wu, “Research on Surface Defect Detection Algorithm of Tube-type Bottle Based on Machine Vision Xiaoyu”, 10th International Conference on Intelligent Computation Technology and Automation, pp. 114-117, (2017). [22] Wittaya Koodtalang, Thaksin Sangsuwan and Surat Sukanna, “Glass Bottle Bottom Inspection Based on Image Processing and Deep Learning”, Research, Invention, and Innovation Congress (RI2C 2019) (2019).
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 72-79 doi:10.4028/p-y9m106 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-31 Accepted: 2022-09-16 Online: 2023-02-27
Obstacle Detection and Text Recognition for Assisting Visually Impaired People Aishwarya. M1,a*, Shivani. S2,b, Deepthi. K3,c, Dr. Golda Dilip4,d 1-4
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani, Chennai, India
[email protected], [email protected], [email protected], [email protected]
a
Keywords: Visually Impaired, Object Detection, Text Recognition, Text-to-Speech, Distance Estimation, Find Objects, Convolutional Neural Network, Computer Vision.
Abstract. As a potential medium for informing blind people, the project remodels the visual world into the audio world. In computer vision, Obstacle detection is the most widely used field with incredible achievement. The obstacles and texts can be enlightened to the visually impaired people where obstacles detected from the frame are addressed by their names and converted to speech. The image-to-text framework is an advanced innovation that is utilized to get the message in the image which will be extremely helpful to read any content. The voice result of the input text is produced by evaluating the adjoining and the recurrence of events of the words by the system. It is used to find the distance to the object that the user wants. This application is compact and helpful to economical society as well as an efficient assistant for visually impaired people. Thus, ultimately this could increase the confidence level of the user and make him/her feel secure. Introduction Dealing with sight loss, already, could be a challenge in itself. But the biggest challenge for a visually impaired person is to navigate around places. To understand the environment, visually impaired people rely on their other senses such as touch and auditory signals. It is very difficult for visually impaired people to know what object is in front of them without touching it. This project aims at assisting visually impaired people to be aware of surrounding objects and learn about them through speech generated from the textual content. It is always challenging to cope with the external environment for a blind person. The idea is to identify and track the dynamic obstacles by recognizing the data in the current frame. “Vision” application will help the person to detect the objects and texts around them. Visually impaired people will be guided in a better way by estimating their distance from the object. For every action, the user will be assisted via voice. Vision helps blind people to explore the world around them and helps them to visualize the world with a new scenario. It is easily accessible and gives a few highlights that can be utilized by the visually impaired for doing everyday activities without the requirement for others' help. Motivation For utilization of any facilities like food or travel, visually impaired people need support, so our project will be useful for them by interacting and helping them recognize the surroundings and environment by conveying about it. This project focuses on assisting users in detecting objects with the utilization of innovation and technology we have which our engineering career inspires us to carry out. Related Works In 1986, simple object detections on images are performed with algorithms like the Histogram of Oriented Gradients, and Support Vector Machine which resulted in a decent accuracy. [2] Now, there are many modern architectures such as YOLO, RetinaNet, Faster R-CNN, and Mask R-CNN. The Existing system uses slower algorithms for object detection (like R-CNN, Fast R-CNN, Faster R-
Advances in Science and Technology Vol. 124
73
CNN, R-FCN, and Libra R-CNN) which typically consists of a two-stage approach. The conventional object detection models are primarily divided into 3 stages: informative region selection, feature extraction, and classification. All of these regions are passed to classification which is a tedious activity. Hence, when compared to two-stage detection one-stage detection performs faster. Before the existence of smart blind sticks, visually impaired people used the normal cane, but normal cane could get stuck in cracks and on uneven surfaces. It also makes navigation difficult. The Smart blind stick cannot differentiate between a person or object, it will simply sense it is an obstacle. A few disadvantages of the wearable sensors are specialized hardships, unfortunate plans, or the unfashionable design of the device. There is often a problem with waterproofing designs. Sweat and bad weather, like heat and precipitation, causes damage to the technology. An assistive device such as a smart blind stick, Kinect goggles system, and wearable technologies have hardware components like sensors and wired circuits which causes difficulties and discomfort for the person. [1] Generally, wearable technology consists of Raspberry Pi, sensor-based obstacle avoidance, a camera, and advanced algorithms for obstacle detection like MobileNet-SSD. [1] To determine the distance between the objects, traditionally ultrasonic sensors or other high-recurrence gadgets that produce sound waves were used. [5] Previously, the Flutter framework was used to build applications, which utilize Dart SDK and the flutter_tts library to convert the text into speech. System Design and Architecture
Fig. 1. System Architecture In Fig. 1, the Vision Application first analyses the current frame according to the user’s preference. For Obstacle Detection, the objects in the frame are identified with the help of the YOLOv4 algorithm. The name of the detected obstacles is passed to voice output. To read the text from a document or an image, the input from the frame is processed using Optical Character Recognition- Pytesseract. The recognized text from the frame is passed to the voice output. To find a specific object, vision gets input from the user. Based on the input, if the object is present in the frame, the distance of the obstacle from the user is calculated and is passed to the voice output. The respective output from each module is converted into speech using a Text-to-Speech synthesizer (pyttsx3) and conveyed to the user.
74
IoT, Cloud and Data Science
Methodology A. Proposed Work
Fig. 2. Work Flow This project is implemented using YOLOv4 and OCR- Pytesseract. In terms of both speed and accuracy, YOLOv4 is superior to the quickest and most accurate detectors compared to others. This version has established its high performance in a broad range of detection tasks. In YOLOv4, Transfer learning is a strategy to reuse the weights in at least one layer from a pre-trained network model in another model by keeping the weights, fine-tuning the weights, and adjusting the weights when training another model. To make the visually impaired person able to listen to messages, OCR is the most effective way for a machine to peruse any text. We prefer not to build an embedded device heavier by including unnecessary hardware modules while changing it into a compact design that has functionalities like obstacle detection. By making use of depth information that the camera uses to detect obstacles, we can calculate the distance between the specific obstacle and the camera that we have already incorporated for obstacle detection. The other applications that have been created before, require the person to catch a picture unlike ours, which doesn't require the person to catch a picture instead it just scans the object and informs them. By doing so, the visually impaired person will not have to periodically erase pictures from their phone's storage. Since it runs in real-time, it will be very useful for visually impaired people. B. Algorithms Used a) YOLOv4 (You Only Look Once- Version 4) YOLOv4 is an incredibly quick multi-object detection algorithm that utilizes a convolutional neural network (CNN) to distinguish and recognize objects. It is a one-stage object detection model that enhances YOLOv3 with several bags of tricks and modules. The process is done by partitioning the frame into a grid and predicting bounding boxes and class probabilities for every cell in a grid. YOLOv4 recommends using IOU loss for bounding box regression like Distance IoU or Complete IoU loss function, leading to faster convergence and better performance. This model comprehends a generalized object representation which means the real-world scene prediction and artwork are fairly precise.
Advances in Science and Technology Vol. 124
75
b) OCR (Optical Character Recognition) Optical character recognition is the conversion of text in an image, transcribed, or printed text into the machine-encoded text from a photo of a document or a scene. Our goal is to determine the bounding boxes for each word of a sentence in a frame. Once we have got those regions, we can then OCR them. The best part of pytesseract is that it supports a wide variety of languages. It is through wrappers that Tesseract can be made compatible with totally different programming languages and frameworks. C. Obstacle Detection Obstacle detection is an important computer vision task used to detect instances of visual objects of certain classes (for example, humans, animals, cars, or buildings) in digital images such as photos or video frames. The object detection techniques are dealing with multiple object classification and its localization. In this module, we’ll be using the YOLOv4 algorithm. The weights and configuration files for YOLOv4 are downloaded and they are fine-tuned according to the specification of the project. The coco name file contains 80 objects names that the pre-trained model can classify. Using the cv2 function, the weights and configuration files are loaded to the network. After loading the network as well as the objects' names, the input is taken from the frame using the camera which is pre-processed and communicates with the trained model. We get the output layers' names and then pass them into the forward function which will do a forward pass through the trained model. We loop through the layers to get the bounding boxes and confidence of the detected objects and will eventually make the prediction. For detected objects, a voice output is generated and it is communicated to the user. D. Text Recognition Text recognition is the process of localizing where an image text is. One can think of text recognition as a specialized form of obstacle detection. In this module, we will be using the python wrapper named pytesseract. It takes input from the frame using the camera which will be used to recognize text from a large document, or it can also be used to recognize text from an image of a single text line which is then converted into an array. We loop through the array to get the bounding boxes and confidences of the recognized text. Voice output is generated for recognized text and it is communicated to the user. E. Find Objects Speech Recognition incorporates technology and linguistics which permits computers to grasp human language. Using speech recognition, in python, the spoken words are converted into text, make a query, or, give a reply. With the assistance of a microphone, speech recognition starts by taking the sound energy produced by the person speaking. It then converts the electrical energy from analog to digital, breaks the audio information down into sounds, and analyses the sounds using algorithms to search out the most probable word that matches that audio. To recognize the voice, we first initialize the recognizer using speech recognition. Speech to text takes input from the microphone as a source. Using adjust_for_ ambient_noise, we adjust the energy threshold based on the surrounding noise level. We get inputs from the user using listen function. Then the inputs are converted into speech using recognize_google. The converted speech is compared with the obstacles detected in the frame which is used to find the particular object. where-:
distance = (2 x 3.14 x 180) ÷ (w + h x 360) x 1000 + 3
h- bounding box height
w - bounding box width,
(1)
76
IoT, Cloud and Data Science
The width and height are used for estimating the obstacle and depicting the detail of the detected obstacle which is retrieved from the bounding box coordinates. The distance of the obstacle from the camera will vary depending on both variables. F. Voice Assistant Text-to-speech is the process of converting text into synthesized speech. It is used to communicate with users when reading on a screen is either impracticable or inconvenient. In this project, we are using the pyttsx3 package which has built-in say() function that takes a string value and talks it out. This function monitors when the engine starts converting text to speech and waits for that much time, and doesn't permit the engine to close. If it is not initialized, the engine probably won't work as expected as the processes won't be synchronized. The engine is shut down by calling the stop() function when all the processes are over. Voice output is generated for the detected objects, recognized text, and the estimated distance to an object. Results
Fig. 3. Home Screen
Fig. 4. Obstacle Detection
Advances in Science and Technology Vol. 124
77
Fig. 5. Text Recognition
Fig. 6. Distance Estimation Conclusion For all individuals, vision is the most important approach to receiving and decoding information from the world. However, individuals with vision impairment expertise the world through entirely different mechanisms. This paper deals with an application called "VISION" which is created mainly for visually impaired people. We have created this project with various options, which might facilitate them to build their life easier. The user gets assisted by knowing what’s around them by exploiting our application. It provides them with a list of information like the objects near them and the textual content. The results are recited so the person can hear them through a voice-driven interface that is custom-built. By leveraging these technologies, this app was created that benefits visually impaired people with their everyday tasks. Acknowledgement We deeply express our sincere thanks to SRM Institute of Science and Technology, Vadapalani, for encouraging and allowing us to present the project successfully.
78
IoT, Cloud and Data Science
References [1] Muiz Ahmed Khan, Pias Paul, Mahmudur Rashid, Mainul Hossain, and Md Atiqur Rahman Ahad, “An AI-Based Visual Aid with Integrated Reading Assistant for the Completely Blind”, IEEE Transactions on Human-machine Systems, Vol. 50, No. 6, December 2020. [2] Zhong-Qiu Zhao, Peng Zheng, Shou-Tao Xu, and Xindong Wu, “Object Detection with Deep Learning: A Review”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, No. 11, November 2019. [3] V. Padmapriya,R. Archna,V. Lavanya,CH. VeeralakshmiKrishnaveni Sri, “A Study on text recognition and obstacle detection techniques”, IEEE 2020 International Conference on System, Computation, Automation and Networking (ICSCAN). [4] Md. Amanat Khan Shishir, Shahariar Rashid Fahim, Fairuz Maesha Habib, Tanjila Farah, “Eye Assistant using mobile application to help the visually impaired”, IEEE 1st International Conference on Advances in Science, Engineering and Robotics Technology 2019 (ICASERT 2019). [5] Rajesh Kannan Megalingam; Palnati Teja Krishna Sai, Aditya Ashvin, Pochareddy Nishith Reddy, Bhargav Ram Gamini, “Trinetra App: A Companion for the Blind”, IEEE 2021 Bombay Section Signature Conference (IBSSC). [6] G Chandan, Ayush Jain, Harsh Jain, Mohana, “Real Time Object Detection and Tracking Using Deep Learning and OpenCV”, IEEE 2018 International Conference on Inventive Research in Computing Applications (ICIRCA). [7] Bojan Strbac, Marko Gostovic, Zeljko Lukac, Dragan Samardzija, “YOLO Multi-Camera Object Detection and Distance Estimation”, IEEE 2020 Zooming Innovation in Consumer Technologies Conference (ZINC). [8] Nouran Khaled, Shehab Mohsen, Kareem Emad El-Din, Sherif Akram, Haytham Metawie, Ammar Mohamed, “In-Door Assistant Mobile Application Using CNN and TensorFlow”, IEEE 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE). [9] Roshan Rajwani, Dinesh Purswani, Paresh Kalinani, Deesha Ramchandani, Indu Dokare, “Proposed System on Object Detection for Visually Impaired People”, International Journal of Information Technology (IJIT) – Volume 4 Issue 1, Mar - Apr 2018. [10] Miss Rajeshvaree Ravindra Karmarkar, Prof.V.N. Honmane, “Object Detection System for the Blind with Voice guidance”, International Journal of Engineering Applied Sciences and Technology, 2021, Vol. 6, Issue 2. [11] Rakesh Chandra Joshi, Saumya Yadav, Malay Kishore Dutta, Carlos M. Travieso-Gonzalez, “Efficient Multi-Object Detection and Smart Navigation Using Artificial Intelligence for Visually Impaired People”, Entropy 2020, 22, 941. [12] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, “You Only Look Once: Unified, Real-Time Object Detection”, IEEE 2016 Conference on Computer Vision and Pattern Recognition (CVPR). [13] J. Nasreen, W. Arif, A. A. Shaikh, Y. Muhammad and M. Abdullah, "Object Detection and Narrator for Visually Impaired People", IEEE 2019 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS). [14] Siva Ganga, Hema, “A Review on Text Recognition for Visually Blind People”, International Research Journal of Engineering and Technology (IRJET), 2020, Volume 7, Issue 2. [15] Hao Jiang, Thomas Gonnot, Won-Jae Yi, Jafar Saniie, “Computer Vision and Text Recognition for Assisting Visually Impaired People using Android Smartphone”, IEEE 2017 International Conference on Electro Information Technology (EIT).
Advances in Science and Technology Vol. 124
79
[16] Samruddhi Deshpande, Ms. Revati Shriram, “Real Time Text Detection and Recognition on Hand Held Objects to Assist Blind People”, International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT) 2016. [17] M. Murali, Shreya Sharma and Neel Nagansure, “Reader and Object Detector for Blind”, International Conference on Communication and Signal Processing (ICCSP) 2020. [18] S. A. Jakhete, P. Bagmar, A. Dorle, A. Rajurkar, P. Pimplikar, "Object Recognition App for Visually Impaired," IEEE 2019 Pune Section International Conference (PuneCon). [19] A. Badave, R. Jagtap, R. Kaovasia, S. Rahatwad, S. Kulkarni, "Android Based Object Detection System for Visually Impaired," 2020 International Conference on Industry 4.0 Technology (I4Tech). [20] Venkata Naresh Mandhala, Debnath Bhattacharyya, Vamsi B, Thirupathi Rao N, “Object Detection Using Machine Learning for Visually Impaired People”, 2020 International Journal of Current Research and Review (IJCRR). [21] Sanghyeon Lee, Moonsik Kang, “Object Detection System for the Blind with Voice Command and Guidance”, IEIE Transactions on Smart Processing and Computing, vol. 8, issue 5, 2019. [22] Trupti Shah, Sangeeta Parshionikar, “Efficient Portable Camera Based Text to Speech Converter for Blind Person”, International Conference on Intelligent Sustainable Systems (ICISS 2019). [23] Eliganti Ramalakshmi, Dixitha Kasturi, Gouthami V, “Object Detector for Visually Impaired with Distance Calculation for Humans”, International Journal of Engineering and Advanced Technology (IJEAT), Vol 9 Issue 4, April, 2020. [24] Fatima ZahraeAitHamou Aadi, Abdelalim Sadiq, “Proposed real-time obstacle detection system for visually impaired assistance based on deep learning”, International Journal of Advanced Trends in Computer Science and Engineering, Vol 9, Issue 4, 2020. [25] P. Zhihao and C. Ying, "Object Detection Algorithm based on Dense Connection," 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2019
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 80-87 doi:10.4028/p-238mcg © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-01 Accepted: 2022-09-16 Online: 2023-02-27
Recognition of American Sign Language with Study of Facial Expression for Emotion Analysis Akanksha Chakraborty1,a*, R.S. Sri Dharshini2,b K.Shruthi3,c R Logeshwari4,d Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus,No.1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India
1,2,3,4
[email protected], [email protected], [email protected] d [email protected]
a
Keywords: Sign Language, ASL, Emotion Analysis.LSTM, CNN, Media pipe..
Abstract. Sign Language is a medium of communication for many disabled people. This real-time Sign Language Recognition (SLR) system is developed to identify the words of American Sign Language (ASL) in English and translate them into 5 spoken languages (Mandarin, Spanish, French, Italian, and Indonesian). Combining the study of facial expression with the recognition of Sign Language is an attempt to understand the emotions of the signer. Mediapipe and LSTM with a Dense network are used to extract the features and classify the signs respectively. The FER2013 data set was used to train the Convolutional Neural Network (CNN) to identify emotions. The system was able to recognize 10 words of ASL with an accuracy of 86.33% and translate them into 5 different languages. 4 emotions were also recognized with an accuracy of 73.62%. Introduction A.Sign Language Recognition Speech is the most common way of communication among people, but there exists a section of the world that cannot use vocal communication. The number of such people is in the minority compared to others making communication an everyday struggle. Sign language is considered a visual language that involves the use of hand gestures and movements which is used by speech and hearing-impaired people. Most of the historical sign languages were limited to fingerspelling which involved spelling out each letter of the word using finger signs; until gestures were mapped to different words. Sign language has now evolved to a state where it has over 300 variations which accounts for the different dialects of the commonly used sign languages. The usage of these languages is not limited to one per country since they vary significantly, adding to the need for a translation medium. Communication between speech and hearing-impaired people with others is difficult due to the lack of knowledge of sign language. To bridge this communication gap there is a need for a medium to recognize the signs. Sign language recognition is categorized into vision-based and sensor-based. Sensor-based SLR includes an external device which makes this method not only expensive but also uncomfortable for users. Vision-based sign language recognition is an alternative approach. Images and videos are given as input to the system. The first vision-based work in Sign Language came into existence in 1988 which was carried out by Tamara et al [1] and was based on the Japanese Sign language. This approach is easier to work on and is comparatively cost-efficient. Among the various sign languages, this paper focuses on American Sign Language(ASL).Since ASL has its own rules for pronunciation, word formation, and word order, it is the same as any spoken language. Phonemic components of ASL include facial movements, torso movements, and hand movements. Many lexicographers believe that ASL follows a subject-verb-object (SVO) syntax in most cases. ASL is impacted by different variables including age, place, pace, and native language.
Advances in Science and Technology Vol. 124
81
In this work, we have used Mediapipe, Long Short-Term Memory(LSTM), and a dense layer to recognize words of ASL. B.Emotion Analysis Human existence relies heavily on emotions. Besides spoken language, interpersonal communication includes gestures, body language, and facial expressions to convey feelings. In the second part of this paper, we examine how facial expressions can be used to recognize emotions. People with speech or hearing impairments are often misunderstood to be emotionless. The truth is that the conveyed emotions go unnoticed because people are more focused on deciphering the signs. Their style of communication is characterized by facial movements and the pace of making the sign which substitutes the tone. This project centres on identifying emotions through facial expressions. An Emotion Recognition System is developed by training it to assess a particular facial expression and identify the conveyed emotion. Facial recognition uses biometric measures of the human face, to identify the emotions of the person. The primary requirement for face detection is a device that has a digital camera to capture the required video graphic or photographic data. A Convolutional neural network is trained using the FER2013 data set to recognize different emotions in real-time. Emotion recognition using facial expression is a big domain on its own and in this project, it is combined with Sign Language Recognition to bring out an innovative outcome. Related Work This section contains a brief introduction to the algorithms being used by the existing SLR and Emotion recognition systems: 1.Hidden Markov Model (HMM) A probabilistic model used in Machine Learning to reveal the indirectly observable factors that determine the evolution of observable events. This was first used in the field of SLR in 1996 by [2] to recognize Taiwanese Sign Language. A combination of HMM with Principal Component Analysis (PCA) was employed to detect the key features of hand signs by [3] in 2011 which achieved a recognition error rate of 10.90%. An EMbased algorithm was deployed with HMM to resolve the challenges related to video processing and weak supervision by [4] in 2016. HMM is one of the most widely used temporal models in the field of Sign Language Recognition. [5] studied the importance of HMM in continuous recognition of Sign Language and made use of Recurrent Neural networks in sequential learning. The results proved that RNN outperformed HMM. 2.Support Vector Machine(SVM) A supervised ML algorithm that primarily analyses data and identifies patterns in order to make predictions and perform regression analysis. It is used for both linear and nonlinear classification. [6] used SVM to recognize the alphabets of ASL. The dataset contained RGB images of 24 static alphabets excluding ‘J’ and ‘Z’ which are dynamic in nature. The recognition rate was found to be 75%. [7] proposes a new method of combining spatial attention-based 3D CNN with a temporal attentionbased classifier. This method outperforms the combination of 3D CNN with SVM. The paper concludes that RNN-based methods provide better results for continuous recognition.
82
IoT, Cloud and Data Science
SVM has proven to be an effective classifier in facial expression recognition. Even with limited training data, SVMs are capable of providing good classification accuracy, which makes them ideal for use in dynamic, interactive methods of expression recognition [8] [9] used the JAFFE dataset to recognize 6 emotions based on facial expression. Vital points on the face were marked manually and were trained using SVM. This experiment is in its initial stage and further tests are yet to be carried out to conclude the accuracy. [10] combined PCA, CNN, and SVM to perform emotion recognition on the JAFFE dataset where PCA was utilized for feature extraction and CNN and SVM were used for classification. The paper concluded that CNN outperforms SVM when large data was involved. 3.Principal Component Analysis (PCA) An unsupervised statistical method is used for dimensionality reduction. An important functionality of PCA is to highlight the strong patterns and variations in any given data. This makes the process of data exploration and visualisation easier by simplifying the complexity of high-dimensional data without interrupting its trends and patterns. A new method for frame-based classification using weakly labelled sequence data was suggested by [11]. The proposed method combined CNN with an iterative EM algorithm to train 1 Million hand images. PCA was used in this to lower the dimensionality of the feature maps from 1024 to 200. [12] combined PCA with sparse auto-encoder (SAE) to recognise 24 static alphabets of ASL excluding ‘J’ and ‘Z’ and compared it with the Deep belief network (DBN). The accuracy achieved for DBN is much lower, as the proposed system achieved an accuracy of 99.05%. 4. Long-Short Term Memory (LSTM) A RNN sequential network algorithm that can remember patterns selectively over an extended period of time. It is an effective method for modeling sequential data since it does not rely on pre clustered data per frame. Thus making it suitable for learning complex dynamics of human activity. [13] implemented LSTM to recognize the words of Indonesian Sign Language.It was able to overcome the limitations of HMM as LSTM has the ability to determine whether it will store, ignore or forget given information. LSTM was able to outperform HMM and give an accuracy of 77.4% [14] developed a Chinese Sign Language Recognition System. LSTM was used to train 100 CSL sentences, consisting of 4 to 8 words each. The proposed system uses a hierarchical adaptive recurrent neural network with variable-length key clip mining, temporal pooling, and attention-based weighing mechanisms. In situations where there are prolonged-time gaps of unknown duration between important events, the LSTM network can be used to classify, process, and predict time series that are based on previous experience. This feature enables LSTM to outperform RNNs, HMMs, and other sequence learning methods. 5. Convolution Neural Network (CNN) CNN is a Deep Learning algorithm that can learn the importance of various aspects of an image and distinguish it from others. [15] makes an attempt at predicting human emotions using deep CNN and how emotional intensity changes from low to high levels on a face. In this research, two datasets were used, namely FERC2013 and Extended Cohn Kanade (CK+). The accuracy of the system is around 90+ %. [16] focuses on constructing a Deep Convolutional Neural Network (DCNN) model to classify human facial emotions. A 48-megapixel camera was used to collect data for identifying 5 different facial emotions: angry, happy, neutral, sad and surprised. The system achieved the highest accuracy of 98.82% for classifying sad and the lowest accuracy of 82.75% for classifying angry.
Advances in Science and Technology Vol. 124
83
CNN and Deep Neural Network were combined by [17] to create a facial emotion recognition model that can identify 7 emotions. The paper further compares the performance of the Venturi Architecture with the performance of two existing deep neural network architectures. It deployed the Karolinska Directed Emotional Faces dataset and achieved the highest accuracy of 86.78% for Venturi Architecture. CNN can work well on huge amounts of data as it reduces their dimensionality by a significant amount, thus decreasing its complexity and the chances of overfitting. System Overview A.American Sign Language Recognition 1.Data Collection Data collection includes gathering information formatted in a particular way and analysing them. 10 basic words were chosen from ASL and 60 videos were recorded per word, thus contributing to a dataset of 600 videos. The videos of 3-4 signers performing the signs were recorded. It was important to perform these signs in a particular way to overcome certain challenges and limitations in Sign Language Recognition. To make the SLR system work in everyday life, the videos were recorded in a dynamic background where a secondary subject was introduced in the frame while the primary subject was signing. 2. Image Processing OpenCV is used to record videos for the dataset which are stored as 30 frames. The key point of each frame is extracted using Mediapipe. OpenCV works in BGR colour space and Mediapipe works only in RGB colour space. The recorded videos are initially in BGR colour space and are converted to RGB colour space to comply with Mediapipe during keypoint extraction. The frames are then converted back to BGR colour space after the use of Mediapipe. 3. Keypoint Extraction One of the most crucial parts of image processing is choosing and extracting the important features in an image. The amount of storage required to handle data increases with its size. This problem can be solved by storing just the extracted keypoints thereby simplifying the complexity and maintaining the accuracy of the classifier. Keypoint extraction is done for the pose, face, left and right hand of the signer. These key points are stored in four different arrays which are later concatenated to form the dataset consisting of all the extracted features. 4. Methodology A 3 - layered sequential LSTM network was used to train the dataset to recognise the different signs. The first layer of the network had 64 LSTM units, the next layer consisted of 128 LSTM units and the final layer had 64 LSTM units. All the layers used ReLU as their activation function. The LSTM network was connected to a 3 - layer Dense network. The first two layers of the dense network had 64 and 32 units and used ReLU as activation energy. The final layer is made of 10 units and uses a softmax function. 95% of the data is used for training while 5% is used for testing. The network is trained using the data until the categorical cross-entropy loss is minimum.
84
IoT, Cloud and Data Science
5. Real-Time Recognition Using OpenCV the live feed is taken and fed into the trained neural network to produce a result. The sign is initially recognized in English and is translated to 5 other languages to widen the communication range. B.Emotion Analysis 1. Data Pre-Processing According to the requirements for recognising basic emotions [angry, happy, sad, and neutral], the FER2013 Dataset, originally including images for seven emotions, has been altered to accommodate images for four emotions. The dataset contains 2 columns, one for the emotion and the other column consisting of a string of pixels depicting the images of size 48x48.
Figure 1. Various emotions The next essential step in this phase is normalisation, this step is carried out to give equal importance to each variable. The mean normalisation method is used here. 2. Method A sequential 2D CNN of 3 layers is built and trained with the preprocessed dataset. The first two convolutional layers are made of 64 units each and use ReLU as their activation function. A regularisation function called Dropout is used to drop or ignore random output values of some layers during the training phase to avoid overfitting the model. The 3rd layer is made of 128 units and also uses ReLu as its activation function. Max pooling is used in each of the layers to reduce the spatial dimension of the convoluted feature.The Haar Cascade object detection algorithm is used to detect faces in real time. 3.Real-Time Recognition The Haar Cascade algorithm is used along with OpenCV to recognize the face in real-time.It is then fed into the trained CNN model to predict and display the emotion of the signer.
Advances in Science and Technology Vol. 124
85
Figure 2. Architecture diagram Results The performance of the proposed system was checked via a real-time feed. The system performed successfully, despite the dynamic background and achieved an overall accuracy of 86% for sign recognition and 73.62% for emotion recognition. The addition feature was able to translate the output in real time. Table 1 : Accuracy of each emotion recognized by the system Emotion
Accuracy
Angry
42
Happy
59
Sad
36
Neutral
33
Compared to all other emotional expressions and to neutral expressions, facial expressions of happiness were judged as more pleasant than all other expressions. The most unpleasant reactions tend to derive from neutral facial expressions, followed by sadness and anger. It was found that neutral facial expressions elicited the least arousing reactions, followed by expressions of sadness and anger. The discrepancy between these results suggests that, in general, emotional recognition accuracy improves as a person transitions from late childhood to preadolescence.
86
IoT, Cloud and Data Science
Conclusion A real-time SLR system to identify words of ASL was developed, identified words are then translated into 5 languages and facial expression recognition was combined with the system to identify the emotions of the signer. The system was effective in identifying 10 words along with 4 emotions. The future goal for the system would be to extend its vocabulary to identify more words and upgrade the system to identify sentences without signal segmentation. Acknowledgement The assistance provided by Mrs R.Logeshwari was greatly appreciated and we would like to extend our thanks to her. References [1] Shinichi Tamura and Shingo Kawasaki. 1988. Recognition of Sign Language Motion Images. Pattern Recognition 21, 4 (1988), 343–353. [2] R.-H. Liang and M. Ouhyoung, ``A sign language recognition system using hidden Markov model and context-sensitive search,'' in Proc. ACM Symp. Virtual Reality Softw. Technol. (VRST), 1996, pp. 59_66. [3] M. M. Zaki and S. I. Shaheen, ``Sign language recognition using a combination of new visionbased features,'' Pattern Recognit. Lett., vol. 32, no. 4, pp. 572_577, 2011 [4] O. Koller, H. Ney, and R. Bowden, ``Deep hand: How to train a CNN on 1 million hand images when your data is continuous and weakly labelled,'' in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 3793_3802. [5] R. Cui, H. Liu, and C. Zhang, ``A deep neural framework for continuous sign language recognition by iterative training,'' IEEE Trans. Multimedia, vol. 21, no. 7, pp. 1880_1891, Jul. 2019. [6] Li, Shao-Zi; Yu, Bin; Wu, Wei; Su, Song-Zhi; Ji, Rong-Rong (2015). Feature learning based on SAE–PCA network for human gesture recognition in RGBD images. Neurocomputing, 151(), 565– 573. doi:10.1016/j.neucom.2014.06.086 [7] Huang, Jie; Zhou, Wengang; Li, Houqiang; Li, Weiping (2018). Attention-based 3D-CNNs for Large-Vocabulary Sign Language Recognition. IEEE Transactions on Circuits and Systems for Video Technology, (), 1–1. doi:10.1109/TCSVT.2018.2870740 [8] Facial Expression Recognition Using Support Vector Machines Philipp Michel and Rana El Kaliouby Computer Laboratory University of Cambridge Cambridge CB3 0FD, U.K [9] Bajpai, Anvita. (2010). Real-time Facial Emotion Detection using Support Vector Machines. International Journal of Advanced Computer Science and Applications. 1. 10.14569/IJACSA.2010. 010207. [10] Tadese Henok Seifu, 2022, Automated Facial Expression Recognition using SVM and CNN, International Journal of Engineering Research & Technology (IJERT) Volume 11, Issue 03 (March 2022) [11] O. Koller, H. Ney and R. Bowden, "Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3793-3802, DOI: 10.1109/CVPR.2016.412. [12] M. Ma, X. Xu, J. Wu and M. Guo, "Design and analyze the structure based on deep belief network for gesture recognition," 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), 2018, pp. 40-44, DOI: 10.1109/ICACI.2018.8377544.
Advances in Science and Technology Vol. 124
87
[13] Erde Rakun1, Antietam. Arymurthy1, Lim Y. Stefanus1, Alfan F. Wicaksono, I Wayan W. Wisesa1 ‘Recognition of Sign Language System for Indonesian Language using Long Short-Term Memory Neural Networks’ DOI:10.1166/asl.2018.10675 [14] D. Guo, W. Zhou, A. Li, H. Li and M. Wang, "Hierarchical Recurrent Deep Fusion Using Adaptive Clip Summarization for Sign Language Translation," in IEEE Transactions on Image Processing, vol. 29, pp. 1575-1590, 2020, DOI: 10.1109/TIP.2019.2941267. [15] G. A. R. Kumar, R. K. Kumar and G. Sanyal, "Facial emotion analysis using deep convolution neural network," 2017 International Conference on Signal Processing and Communication (ICSPC), 2017, pp. 369-374, DOI: 10.1109/CSPC.2017.8305872. [16] E. Pranav, S. Kamal, C. Satheesh Chandran and M. H. Supriya, "Facial Emotion Recognition Using Deep Convolutional Neural Network," 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), 2020, pp. 317-320, DOI: 10.1109/ICACCS48705.2020.9074302. [17] Verma, P. Singh and J. S. Rani Alex, "Modified Convolutional Neural Network Architecture Analysis for Facial Emotion Recognition," 2019 International Conference on systems, Signals and Image Processing (IWSSIP), 2019, pp. 169-173, DOI: 10.1109/IWSSIP.2019.8787215
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 88-95 doi:10.4028/p-0pa0r8 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-01 Accepted: 2022-09-16 Online: 2023-02-27
Recognition of Plant Leaf Diseases Using CNN Vishnupriya Gurumurthi1,a, S.Manohar2,b Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani, Chennai Email: [email protected], [email protected] Keywords: Image processing, Keras, Leaf Diseases, Classification, CNN.
Abstract - Agriculture improvement is a global economic issue and ongoing challenge in this covid-19 pandemic that is highly dependent on effectiveness. The recognition of the diseases in plant leaf performs a major role in the agriculture industry and city-side greenhouse farms approximate analysis of the leaf disease this article intends to integrate image processing techniques with the “convolutional neural network”, which is one of the deep learning approaches, to classify and detect plant leaf disease and publicly available plant the late data to help treat the leaf as early as possible, which controls the economic loss. This paper has a set that was used which consists of 10 classes of disease and three classes of a plant leaf, this research offers an effective method for detecting different diseases in plant leaf variations. The model was created to detect and recognize healthy plant kinds, such as tomato and potato, and pepper these three leaves will perform under the algorithm called a convolutional neural network. By modifying the parameters and changing the pulling combination, models that have been used to train and test these types of leaf sample images can be created. leaf disease recognition was based on these 10 different types of classes in three different species tomato, potato, and pepper the classification of sample images has reached diseases identification accuracy. I.
Introduction
As countries continue to execute steps aimed at halting the increasing COVID-19 pandemic, every country must take efforts to maintain food security, according to a joint statement by the DirectorsGeneral of FAO, WHO and WTO released in March 2020. Many countries seeking to enhance food production are increasingly concerned about food security. An important study issue is how to better understand and control leaf diseases. Leaf diseases have a significant impact on crop cultivation. To successfully identify the disorders, researchers have put in a lot of effort. With the advancement of machine learning and agricultural technology, plant leaf diseases may now be detected automatically. Deep learning approaches have made considerable progress in the research of plants, In recent years, disease recognition has improved. Plant disease identification using deep learning technologies has gained considerable interest because of these properties, and it has become a hot research issue. Farmers and plant pathologists have long relied on their eyes to diagnose illnesses and make decisions based on their past experiences, which is often erroneous and prejudiced because many diseases appear to be similar in the early stages. In addition, their experiences must be passed down through the generations. This technique leads to the usage of pesticides that aren't necessary, resulting in greater production costs. Based on this research, a reliable disease detector in conjunction with a reliable database to assist farmers is required, particularly for young and inexperienced farmers and city-side people. With state-of-the-art Machine learning (ML) and pattern recognition algorithms, advances in computer vision pave the way for this. There is also a necessity. The classification model is an especially supported machine learning algorithm that provides the output within the sort of multi-class value. The feature extraction is a vital phase of the project to feature more benefits to the system. To tackle the issue of massive data, data augmentation is helpful. CNN algorithm has the capability of handling large datasets as well as streaming datasets.
Advances in Science and Technology Vol. 124
89
This algorithm works on the various layers of probability which detects the probability of a diseased leaf or healthy dataset. II.
Related Work
This base paper (1), Predicts of Plant Diseases with a large scale of sample images and the model accuracy. Visual plant disease detection as a foundation for plant disease diagnosis is extensively investigated in this work. It determines by computing the discriminative level of each patch, and the weights of all the separated patches from each image depending on the cluster distribution of these patches can be determined. The LSTM (Long Short-Term Memory) network is used to encode the weighed patch. Large benchmark scale of plant disease dataset with two seventy-one plant diseases and the species are having categories and 220,592 images. The drawback of this paper is the lack of systematic investigation and a large-scale dataset and lack of efficiency. In the base paper (2), The accuracy of a Deep Learning (DL) model trained to predict fourteen crop species and twenty-six crop illnesses. To split the regions and merge the color and texture features, use the K-means clustering algorithm. They compared different deep learning algorithms with model accuracy in the foundation paper. Plant disease identification research using classic image processing approaches has yielded some promising results, including high disease diagnosis accuracy, however, there are still flaws and limitations. When evaluating the sample photos, it largely relies on spot segmentation. It is challenging to test the model of the algorithm's disease recognition ability in more complicated contexts. Based on the paper, automatic detection and diagnosis leaves of maize illnesses leaves are greatly desired (3). To improve maize leaf identification accuracy while lowering the number of network parameters. It is possible to change the settings, change the pooling combinations, add dropout operations, adjust linear unit functions, and reduce the number of classifiers. Results in two superior models for training and evaluating nine different types of maize leaf images. The methods may have enhanced maize leaf disease accuracy while reducing convergence iterations, which could aid in model training and recognition efficiency. To help farmers make rapid and informed decisions on crop disease and data. In a variety of methods, the classification system can be integrated with mobile devices. This current base paper (4) is useful for diagnosing plant pathogens with any automated technique since it reduces widespread monitoring of farm sites and detects disease indicators extremely early, that is when they appear on the plant's leaves. The authors offer a segmentation system for automatically detecting and classifying plant leaf diseases. This article examines the many approaches for classifying diseases designed to check plant leaf diseases. The image was segmented using a genetic technique, which is critical for identifying sickness in leaves. Artificial intelligence(AI) plays a vital part in crop security in this basis study (5) in the realm of modern agriculture. Plant leaf disease has long been a cause for concern for plant cultivation and crop production across the world. Day-to-day activities can have an impact on plants. These diseases have major ramifications not just for plant health but also for human health, as they transfer viruses, bacteria, and fungi that cause infections. Deep learning techniques have been made possible by advancements in computer vision and increased smartphone adoption. Deep learning is utilized in image processing techniques and is used on enormous amounts of data. Using a convolutional neural network, we present an additional method for classifying damaged leaves. In the foundation paper (6), The technology for categorizing plant leaves using characteristics and ML algorithms has steadily advanced in recent years. A leaf classifier's training, on the other hand, is frequently based on supervised training approaches employing a collection of samples. This paper proposes the use of an optimization-based segmentation and law mask system for the problem of leaf disease classification with a high number of sample classes and sizes. Finally, the Support Vector Machine is used as a Classifier to categorize the leaves mostly in learning metric space. As a performance statistic, the average classification accuracy is used. Accuracy, recall, and precision is all important factors in classification. The crucial categorization technique for potato, tomato, and red pepper has been enhanced.
90
III.
IoT, Cloud and Data Science
System Design
Fig.1. The input and decision structure are shown. As input, we'll use diseased or healthy leaf photos, and as a result, we'll obtain the anticipated disease class name. In the meanwhile, feature extraction and picture classification are performed.
Fig.2 The model's development process. The figure depicts the major processes in the creation of the suggested model. The output is displayed after a series of processes such as data pre-processing, data cleaning, and feature extraction, training and testing, performance neural networks, detecting diseases names, and model correctness. IV.
The Proposed Work
We have seen various ways for Plant leaf diseases recognition concepts using K-means clustering, Machine Learning techniques, etc. But still, there is no solution for a high level of recognition accuracy. So, the focus of this project is to improve the model accuracy and identify the diseases. In this project, we take an image as input and a disease name as output by using a convolutional neural network algorithm. Further, we will train our model and will predict our model accuracy. The main aim of this project is to recognize the diseases accurately with the highest recognition accuracy. We have planned to design the module so that a person with no knowledge about agriculture can also be able to check and get information about plant leaf diseases. It proposed a system to predict leaf diseases.
Fig.3. Demonstrates the design of technical modules of the proposed model.
Advances in Science and Technology Vol. 124
91
It explains the experimental analysis of our methodology. Samples of images from the Kaggle dataset are collected that are comprised of different plant diseases like Tomato, Potato, Pepper, and Healthy Leaves. In the innovative system, the paper aims particularly at ten classes of leaf diseases and three classes of the plant leaf. The classes are listed below,
Fig.4. These are disease classes; we selected 10 disease classes and three-leaf classes from the dataset. We trained 2,500 example photos in total, with 250 images in each class. We've also assessed. A. Dataset Collection The selection of an adequate dataset is critical because it is a necessity at every step of the process, from training to testing. Sample images are gathered from a variety of sources and locations & organized into ten separate classes of leaf diseases, each representing a particular disease kind. To reliably recognize the infected plant, we need a huge, confirmed images dataset of healthy and diseased plants. We used approximately pictures classified over 10 classes with three different species, tomato, potato, and pepper, to ensure the convolutional neural network model has enough data to train itself. The ability of our models to reliably anticipate the correct names of leaf diseases and leaf species is how we measure their success.
Fig.5. These are some examples of photos from the PlantVillage Dataset dataset, which was downloaded from the Kaggle website. B. Data Augmentation Training CNNs necessitates a large amount of data. The more and more sample images data the CNN learn, the more features they will be able to obtain. Because the initial plant leaf diseases image dataset was gathered. This study is insufficient, alternative methods must be used to enhance the dataset to discern the various illness groups. The other version is made by rotating the sample images at 90° and 180°, mirroring each rotational image, cropping the image, after the processing, trimming the center of the image by the same size, flipping it, and converting all processed photos to grayscale original images have been initialized. The preceding strategies help reduce during the training stage by expanding the dataset. It will improve the efficiency while training and testing the data. C. Feature Extraction Using the color of the Plant-Leaf to extract the characteristics. This will turn the images to grayscale. And then it takes a binary sequence using binarization, where we take the input image as height, width, and depth of image. To get the grayscale equation, divide the channel image by three.
92
IoT, Cloud and Data Science
D. Image Pre-processing and Labelling Images To improve feature extraction and consistency, the images in the sample for the deep CNNs classifiers are pre-processed before the model is trained. One of the most significant jobs is to standardize image size and format. All of the pictures in this study were automatically resized to "96*96 pixels" and "32*32 dots per inch" using Python algorithms based on the Open-CV framework. Agricultural specialists evaluated leaf photographs sorted by keyword search and labeled all of the shots with the appropriate diseases to certify the legality of the classes in the collections. As is generally known, precise photo recognition is required for dataset training and validation. Then, and only then, will a satisfactory and efficient solution be discovered. E. CNN (Convolutional Neural Network Training) The algorithm called CNN is a sort of DL algorithm that uses convolutional neural operations instead of standard matrix multiplication. When it comes to dealing with images, CNN outperforms other Deep Learning algorithms. It collects texture and features from the input photographs, which are both crucial for image categorization. It shows the construction of a standard CNN mechanism, which contains one input layer, one output layer, a collection of Convolutional layers (each with its activation function), pooling layers, fully connected layers, dropout, dense, and flatten as well as one input layer and one output layer (each with an activation function).
Fig.6.Architecture Diagram of CNN Mechanism 1. Conv2D: The layer that convolves the image into several images. Activation is the function that activates the image. ……(1) Each subset of features in CNN is convoluted by many input feature graphs. It resolves the convolutional layer equation (1) given an input x of I, where * denotes the convolution operation, Wi indicates the convolution kernels of the layer, and f is the activation function. Wi= [W1i, W2i,..., WKi], with K denoting the amount of convolution kernels in the layer. An MMN weight matrix makes up each kernel WKi, where M is the window size and N is the number of input channels. 2. MaxPooling2D: This function is used to maximum pool the value from the supplied size matrix, as well as the next two layers. 3. Flatten: It's utilized to reduce the image's dimensions once it's been convolved. 4. Dense: This is the hidden layer that is used to construct a fully linked model. 5. The dropout: It's used to keep off overfitting the sample images of the dataset. 6. Image_Data_Generator: This program resizes the image, applies shear to a portion of it, zooms the image, and flips it horizontally. 7. The loss function: Which is defined as the difference between the anticipated output and the input label, measures the disparity (2).
…….(2)
Advances in Science and Technology Vol. 124
93
Where (2) W denotes the convolution and fully connected layers' weighting matrices, n denotes the number of training samples, I denotes the index of training samples, and k is the index of classes. If the sample I belongs to the kth class, yik=1, otherwise yik=0. The chance that the model predicts that input xi belongs to the kth class is represented by P(xi=k), which is a function of the parameters W. As a result, the loss function's parameters are W. ………(3) Where alpha is the learning rate, which is a critical component in determining the learning step size The index of classes is denoted by the letter k, which has the same meaning as the letter (3). The importance of the learning rate should be safely assessed. The main goal of network training is to figure out what value of W minimizes the loss function E. We employ a stochastic gradient descent (SGD) approach in this investigation, where W is updated iteratively. 8. The Training Methodology: Training the model using a dataset, by splitting the data(images) into two which is known as training and testing, during the training the model will collect the knowledge about the dataset with the help of epochs. The several number of times the model will execute for the training data is indicated by steps per epoch. 9. Epochs: This performs, in the forward and backward passes, this variable indicates how many times the model will be taught. 10. The Validation process: The test data is fed into the model using the spitting operation which will be done before the training part. And the evaluation part is the main output part where we will be recognizing the leaf diseases.
Fig.7. This displays the total number of epochs used in training and validation accuracy; the suggested model uses 37 epochs, and we attained a 97.20 percent accuracy. The “blue line” represents the training accuracy of the epochs, whereas the “red line” represents the validation accuracy. V.
Result
We evaluated the model with a few example photos of infected and healthy leaves during the evaluation phase. The disease's name and which leaf type it is in the multi-class concept will be shown in the anticipated output. The result will display the probability of each class, and then it will choose the suitable leaf diseases using the classification algorithm. Below are a few sample outputs that have been uploaded.
94
IoT, Cloud and Data Science
Fig.8. This is a picture of a healthy potato leaf with a chance.
Fig.9. Tomato early blight
Fig.10. Potato late blight VI.
Conclusion and Scope
The classification strategies utilized in this paper allow the model to collect a large number of different sample circumstances while maintaining excellent robustness. Diversifying pooling procedures, such as the suitable addition of a Re-lu function and dropout operations, as well as integrating numerous model parameter adjustments, can increase identification accuracy, according to experiments, will be discovered in future research, and new algorithms and deep learning structures will be used to train and evaluate models. The accuracy of the model that we have achieved is 97.20%. In this paper, we offer a unique model based on recent deep learning research in plant leaf disease recognition, as well as basic deep learning expertise. If adequate data is provided for training, deep learning algorithms can recognise plant leaf diseases with surprising accuracy and speed. To increase classification accuracy, large datasets with high variability, data augmentation, and the presentation of CNN activation maps are all mentioned. Plant leaf disease detection in small samples, as well as feature extraction for early plant disease diagnosis, have been highlighted. Meanwhile, the model's future scope will be expanded to include the ability to connect the model to mobile devices, allowing farmers to make quick and precise decisions based on crop disease data. On the other hand, there are certain flaws.
Advances in Science and Technology Vol. 124
95
References [1]
“Xinda Liu, Weiqing Min, Shuhuan Mei, Lili Wang, and Shuqiang Jiang”, Plant Disease Recognition: A Large-Scale Benchmark Dataset and a Visual Region and Loss Reweighting Approach, IEEE Transactions on Image Processing, Volume:30, Jan 2021.
[2]
“Lili Li; Shujuan Zhang; Bin Wang”, Plant Disease Detection and Classification by Deep Learning—A Review, IEEE journal article, Volume:9, 2021.
[3]
“Xiaia Chang; Yue Qiao; Fanfeng Meng; Chengguo Fan; Mingming Zhang”, Identification of Maize Leaf Diseases Using Improved Deep Convolutional Neural Networks, IEEE Access, Volume:6, 2018.
[4]
“J. Arunnehru, B. S. Vidhyasagar, and H. Anwar”, Plant Leaf Diseases Recognition Using, Convolutional Neural Network, and Transfer Learning, Volume 647, March 2020.
[5]
“NavneetKaurV.Devendran”, Novel plant leaf disease detection based on optimized segmentation and law mask feature extraction with SVM classifier, Science Direct, Dec 2020.
[6]
Bin Liu; Cheng Tan; Shuqin Li; Jinrong He; Hongyan Wang, “A Data Augmentation Method Based on Generative Adversarial Networks for Grape Leaf Disease Identification”, published in IEEE, June 2020.
[7]
Leaf Disease Identification by Restructured Deep Residual Dense Network, Changjian Zhou; Sihan Zhou; Jing; Jia Song, February 2021, published in IEEE, volume 9.
[8]
Corn Leaf Diseases Diagnosis Based on K-Means Clustering and Deep Learning, Helong Yu; Jiawen Liu; Chengcheng Chen; Ali Asghar Heidari; Qian Zhang; Huiling Chen, volume 9, published in IEEE, October 2021.
[9]
Q.H. Cap, H. Tani, S. Kagiwada, H. Uga and H. Iyatomi, "LASSR: Effective super-resolution method for plant disease diagnosis", Comput. Electron. Agricult., vol. 187, Aug. 2021.
[10] A. Almadhor, H. T. Rauf, M. I. U. Lali, R. Damaševičius, B. Alouffi, and A. Alharbi, "AIdriven framework for recognition of guava plant diseases through machine learning from DSLR camera sensor based high-resolution imagery", Sensors, vol. 21, no. 11, pp. 3830, Jun. 2021. [11] F. Saeed, M. A. Khan, M. Sharif, M. Mittal, L. M. Goyal, and S. Roy, "Deep neural network features fusion and selection based on PLS regression with an application for crops diseases classification", Appl. Soft Comput., vol. 103, May 2021. [12] G.A. Barbedo, "Factors influencing the use of deep learning for plant disease recognition", Biosyst. Eng., vol. 172, pp. 84-91, Aug. 2018. [13] A. Picon, A. Alvarez-Gila, M. Seitz, A. Ortiz-Barredo, J. Echazarra, and A. Johannes, "Deep convolutional neural networks for mobile capture device-based crop disease classification in the wild", Comput. Electron. Agricult. [14] X. Zhang, Y. Qiao, F. Meng, C. Fan, and M. Zhang, "Identification of maize leaf diseases using improved deep convolutional neural networks", IEEE Access, vol. 6, pp. 30370-30377, 2018. [15] Y. Lu, S. Yi, N. Zeng, Y. Liu, and Y. Zhang, "Identification of rice diseases using deep convolutional neural networks", Neurocomputing, vol. 267, pp. 378-384, Dec. 2017. [16] R. Gandhi, S. Nimbalkar, N. Yelamanchili, and S. Ponkshe, "Plant disease detection using CNNs and GANs as an augmentative approach", Proc. IEEE Int. Conf. Innov. Res. Develop. (ICIRD), pp. 1-5, May 2018. [17] H. Durmuş, E.O. Güneş, and M. Kırcı, "Disease detection on the leaves of the tomato plants by using deep learning", Proc. 6th Int. Conf. Agro-Geoinformatics, pp. 1-5, Aug. 2017.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 96-102 doi:10.4028/p-a0t134 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-04 Accepted: 2022-09-16 Online: 2023-02-27
Face Mask and Social Distancing Detection Using Faster Regional Convolutional Neural Network Tushar Bansal M1,a*, Mohammed Saqib Rahman2,b, Jaffer Ali S.3,c, Indumathy M.4,d Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus,No.1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India
1,2,3,4
[email protected], [email protected], [email protected], d [email protected]
a
Keywords: Deep Learning (DL), Faster Regional Convolutional Neural Network (FRCNN), Regional Convolutional Neural Network (RCNN), Fast-Regional Convolutional Neural Network (Fast-RCNN), Convolutional Neural Network (CNN), Region Proposal Network (RPN).
Abstract. Face masks and social distancing is essential for many infectious diseases that spread through micro-droplets. According to WHO, the preventive measure for COVID-19 is to follow social distancing. Face Detection has expanded as a widespread concern in Computer Vision and Image processing. Many unique algorithms are developed using Convolutional Architectures to curate the algorithm as accurately as possible. First, the person in the video frames is pinpointed with the aid of Deep Learning (DL). The second step is to calculate the span between any two individuals through approaches of image processing. We aim to use a binary face classifier which will help us analyze the frame and help in detecting any face present irrespective of its alignment. Our proposed technique helps in generating accurate face segmentation masks from any random-sized input picture. We start by using an RGB image of any size, the approach uses Predefined Training for feature extraction. Faster Regional Convolutional Neural Networks are used for training. Here we segment out the faces present in the given image or frame semantically. Faster Regional Convolutional Neural Network (FRCNN) processes the given data faster with higher accuracy. The precision and decision-making are very elevated in Faster RCNN compared to others. Introduction ML is an Artificial Intelligence (AI) application that provides computers the ability to learn automatically and then improve from the experience gained without the need of being programmed or explicitly commanded on what to do. ML focuses on the development of programs that mainly run-on computing devices that of those access available data and use this data for self-learning purposes. The learning process begins with data observations, such as direct experience or an explicitly provided instruction, in order to look for data patterns and make better decisions in the future based on the provided examples. The main goal is to aid computer’s ability to learn automatically without human intervention or assistance from external sources. One of the main subsets of Machine Learning is Deep Learning, which we are going to make use of to properly implement our idea in this project. DL which is a subset of ML combines the work of both; Artificial Neural Network and Recurrent Neural Networks. The algorithms are built in the same way as machine learning algorithms are, but there are a lot more of them. The term "artificial neural network" refers to the sum of all of the algorithm's networks. Deep learning, in plain terms, is a notion that mimics the human brain by connecting all of the neural networks in the brain. To solve all forms of complicated issues, it employs algorithms and a technique. Face mask and social distancing detection are challenging tasks in today’s day and age. It has been receiving more and more awareness due to the spreading of novel coronavirus disease which turned into a pandemic in 2019. Hence many countries follow the rule like No entry without a mask so as to
Advances in Science and Technology Vol. 124
97
help prevent the spread of Covid-19. Face mask detection is essential for safety purposes and for prevention of fatal to life diseases. In the case of the medical field, masks diminish potential vulnerability risk from an infected person whether they have symptoms or not as all symptoms of diseases such as Covid-19 do not pose an outright threat or show external features. Detection of face masks are key in public environments such as Hospitals, Offices, Airports, Educational institutions and other populous departments. Hence the detection of face masks has turned out to be a vital, yet a challenging issue. Detecting unmasked faces is more effortless as we just have to map human facial features (with object detection), but facial recognition for those wearing masks is complicated as feature extraction of the masked face is complex when compared to an unmasked face as majority of facial area is covered and hidden from view. Multiple distinct features in a human face such as mouth, nose, chin is absent when the person wears a mask. Masks have greatly reduced the chance of people turning victim to many potential exposures risk from anyone who might be infected by a transmittable disease. The first step involved in this process is recognition of face, by extracting and detecting the face from the image. The most common issue is detecting several mask and unmasked faces in an image. A typical object detection method can be used to tackle the problem. Traditional face detection methods that were earlier employed were based off on the usage of Histograms of Gradients; such as Adaptive Boost Algorithm classification of statistics, Viola-Jones detection of object Algorithm and HOG. Object detection methods that were commonly used are divided into two categories namely: multi-stage detectors and single short detectors (SSD). Our project's idea lies with the use of Single Shot Detector (SSD), with the use of FRCNN as our algorithm to detect if a person is wearing a face mask or not. Related Works The past few years have seen an increase in face mask detection algorithms and they have been receiving more attention. A brief description of these techniques is in two parts: the facial features detection methods and detection of social distancing along with their related datasets. Training of a cascading classifier for masked face detection in any orientation by, Rhamadhaningrum and Dewantara [1] made use of Ada Boost classification of statistics algorithm along with Haar, LBP, and HOG features. It has been claimed that utilizing the Haar-like feature results in an accuracy of 86.9%. The method's that were used traditionally for the detection of face made the use of handcrafted features. The most used feature in this area lies in a Haar-like feature, this can be used with AdaBoost algorithm for detection of face. Jun Zhang and Wang Chen [2] made the use of boosted-based cascade model which were lighter to Haar features yet simpler proposed by Viola-Jones. To detect faces, the researchers used an ensemble of decision trees. By comparing pixel intensities between different nodes, it was able to achieve a rapid detection speed. A unified method to detect face masks, which combined detection as well as alignment in a model was presented by Jun and Wang. In addition to this work, they made the use of Context based attention Regional Convolutional Neural Network (RCNN) which is based on the CNN architecture for object detection which was trained over the VGG-16 baseline model to detect masked and unmasked faces with an accuracy rate of 84.1% Bingshu Wang and Yong Zhao [3] proposed a method containing two steps: Predetection and verification. The first stage involves using a trained Faster RCNN model to discover potential facial masks, followed by a BLS-trained classifier to eliminate background regions. The model has been trained over 26403 images of people wearing masks and covering multiple scenes. Experiments on
98
IoT, Cloud and Data Science
the data set show that the suggested strategy outperforms the compared methods with an overall accuracy of 97.32 percent for simple scenes and 91.13 percent for complicated scenes. Shilpa Sethi, Mamta Kathuria and Trilok Kaushik [4] proposed a paper and their idea was to combine the use single-stage and multi-stage detectors to achieve interference at a lower time with a higher rate of accuracy, they make use of ResNet50 as their base model over which they apply the concept of transfer learning to bring about high-level semantic information in multiple feature maps. While working over ResNet50, they achieved a higher rate of accuracy of around 98.2%, which is 11.07% and 6.04% greater than the figures achieved using AlexNet and MobileNet respectively. The article by Isunuri B Venkateswarlu, Jagadeesh Kakarla and Shree Prakash [5] elaborated and explained the use of MobileNet which was then used alongside with a global pooling block for the detection of face masks. A global pooling layer is used in the proposed model to flatten the feature vector, which transforms a multi-dimensional feature into one-dimensional feature vector. For classification, a fully connected dense layer with 64 neurons is linked to the softmax layer. In terms of critical performance metrics, the proposed model beats the existing models on two publicly available face mask datasets which consisted of 3833 and 1650 color images respectively. The paper proposed by Toshanlal Meaanpal, Ashutosh Balakrishnan and Amit Verma [6] emphasizes on how to detect accurate face segmentation masks from a given image of any dimension. For the extraction of facial features, the primary step begins with obtaining an RGB picture of any dimension and engage pre-desired weights for training of architecture VGG – 16. In the training to semantically segment out the faces in a given picture, they make use of Fully Convolutional Networks. They employ the use of Gradient Descent for training, and for the use of loss function; they make use of Binomial Cross Entropy. The FCN's output image is then processed to remove undesired noise, avoid any incorrect predictions, and create a boundary around the faces. In addition, the suggested model has demonstrated excellent results in detecting non-frontal faces. It's also capable of detecting several facial masks in a single frame. Xinqi Fan and Mingjie Jiang [7] proposed a paper titled “RetinaFaceMask”, theirs was one of the first high performing one-stage detector which worked on to detect face masks. To begin with, a dataset was created incorporating the annotations to solve the problems which existed in the previous studies; were they could not distinguish between correct and incorrect mask wearing situations. Furthermore, a context attention module was developed to focus on discriminating features associated with wearing a face mask. The research by Xinqi Fan, Mingjie Fiang [8] and Hong Yan proposed a deep learning-based singleshot light-weight face mask detector that helped in achieving both qualities such as using less computational resources and working under desired performance requirements for embedded devices. Two unique abilities were offered to improve the model’s feature extraction process in order to deal with the low feature extraction capabilities induced by the light-weight model. Moving on, the novelty of their project was to make use of context attention module which was used to extract rich context information and provide a detailed look onto masked regions on a human face. Adding to this is the idea of auxiliary task-based synthesis on Gaussian heat map regression to discover facial discriminating features either with or without mask. Proposed Method Since, in today’s day-to-day life identification of objects using the machine has become a task, the use of ML/DL based projects are becoming popular. We make the use of same in this paper, through which we propose the idea we implemented by making the use of FRCNN an object identification algorithm which has a greater rate of accuracy when compared to its inferior algorithms namely RCNN, Fast – RCNN Neural Network and CNN. In FRCNN the use of RPN comes to existence for
Advances in Science and Technology Vol. 124
99
the first time to propose the regions which is necessary for identifying an object from the given input by the user, this is much more faster when compared to the approach RCNN uses which takes about 2s times in proposing regions for object, it proposes around 2000 regions over a single image to identify the object in the image which is later fed into the layers of CNN model to extract those features necessary for detecting the object making it an approach which consumes much time to compute and provides not that great accuracy. To overcome the drawback of greater computing time as discussed above the use of FRCNN is made here where the task for proposing the number of regions is completely done by the RPN. In fact, the unique feature of this algorithm is that it is the only algorithm which made the use of anchor boxes for proposing regions in the image to identify an object. These boxes are used to detect objects in the specified aspect ratios and sizes/scales. For each, a CNN model is used as a backbone in training the FRCNN algorithm in which the RPN proposes K anchor boxes, for example from the given image below where the backbone is VGG-1616 the number of anchors K=9, where in we get 3 different number of aspect ratios in 3 different scales of area 1282, 2562,5122 with the ratio in 1:1,1:2 and 1:3. So, now we take a 3x3 convolutional with 5122 units of area which is being applied to VGG-16 to create an feature map for every 512-d location. The next step involves two sibling classification layers with 1x1 convolution having said 18 units for classification and 36 units for regression layer. The 18 units in the classification layer is responsible for giving out the possibility of an anchor being the object or not. The 36 units in the regression box is using to improve the anchors coordinates in a location.
Fig 1 – A pictorial representation of our proposed system. The proposed algorithm is trained using ResNet50 as a backbone for detecting facial mask which is a pre-trained model having 50 layers deep architecture, and this is a model which has been trained to identify nearly 1000 objects and classify them with a greater accuracy and very less false rate. The reason behind ResNet50 giving higher accuracy when compared to MobileNet, AlexNet is that it builds a deeper layer and finds a way to negate the layers thus solving the problem of images being vanished at the time of training due to back-propagation. This backbone architecture makes the use of (ReLu) layer which is a shortcut method in performing identity mapping and mapping those output to the stacked layers. Now, we will see the part of how the working part for the project is done with FRCNN along with ResNet50 as the backbone to train the model. Methodology This part will make the detail explanation and the working method of our project, starting from the acquisition of dataset, training and then predicting the final output. The figure illustrated below is the diagrammatic representation of the working of the project.
100
IoT, Cloud and Data Science
Fig 2 – Architecture Diagram Dataset Acquisition The very first step which we do in the project is acquiring the relevant and necessary images which will be required to train the algorithm in the next phase. After researching in multiple datasets, we find a particular dataset that was present in Kaggle containing 3861 images in total with the number of images with people wearing mask is 1960 and number of images with people without mask is 1961. The dataset according to the open library information shows that it has not been used by various other researchers for open study and this is one of the key reasons why we decided to train our model using this dataset. Image Pre-Processing The next phase in our project is to pre-process the images that were acquired earlier. Pre-Processing is a method which is used to convert the raw images into a clean data sheet which can be used for the purpose of training later. In other words, to make the acquired data feasible for training the model this step is done to fine-tune the raw images in the right manner required for the project. Here, we import all the images from the dataset stored locally on the device and these images are labelled into two categories; namely masked and un-masked images. The quality of the images is enhanced and improved so that it doesn’t deteriorate at the time of training. The images are set to a fixed size of 224x224 pixels which is the size supported by algorithm. The pixels are arranged from BGR to RGB to maintain a unified record.
Fig 3 – Pre-Processing stage manifesting masked and Unmasked categories Training and Creation of Model We use the images from our pre-processed dataset and distribute them over four categories labelled as masked and unmasked, two of which aid in training whereas the remaining two are used for testing the model’s performance and accuracy. Following which, the images in the dataset are dispersed into
Advances in Science and Technology Vol. 124
101
two categories of which 70% of the images in the dataset are used to train the model whereas the remaining 30% of the images are used to test the working of the model. This phase is when the model begins to predict using the dataset provided to it during training. The training to testing of the images in the dataset are in the ratio of 7:3. The datasets selected have the same kind of distribution so that there is no kind of discrepancy in comparison of the prediction. This helps in voiding out the chance of model having different types of training input. We get an accuracy rate of 97.76% post training of the model.
Fig 4 – Accuracy of the trained model Detection The trained model that was earlier saved is then imported onto any computing device. Upon successful completion of importing of required libraries and when the code is compiled, the computer then asks the user to grant the process the “Camera” access so as to be able to capture real-time video feed. The frames from the real-time video are preprocessed and all the necessary features are extracted from the ROI pooling layer. Bounding boxes are drawn on the identified individuals through which we first predict whether the individuals in frame are wearing a “Mask or not “and simultaneously on the other hand we are able to compute whether the people in frame are maintaining social distancing or not.
Fig 4 – Demonstration of the Model Conclusion We have successfully developed a system that aids in detection of face masks and also helps to keep social distancing in check. We achieved this by using Faster Regional Convolutional Neural Network (FRCNN) over ResNet50, which we used to build upon as our base model; alongside with OpenCV to check for the above two features. For preprocessing, we made use of a dataset that contained around 3864 images; half of which were masked and the remaining were unmasked images. Extensive training of the model rendered our project to receive the best-in-class accuracy rate of 97.76%. Our proposed method is expected to rightly determine if the individual detected in frame is wearing a mask or not and maintaining social distancing or not at the same time. The implementation of this in a real-time video feed acts as a novelty as it helps to rightfully and immediately determine and take action for rule-benders. Besides mainly being helpful to the public to curb the spread of Covid-19, this can also be used against the spread of other infectious diseases that are contagious and spread through ill-being actions such as coughing or sneezing or by rather being in the range of anyone who might have the symptoms or be infected.
102
IoT, Cloud and Data Science
References [1] Bima Sena Bayu Dewantara, Dhiska Twinda Rhamadhaningrum. "Detecting multi-pose masked face using adaptive boosting and cascade classifier". International Electronics Symposium 2020. DOI: 10.1109 /IES50839.2020.9231934. 20th October 2020. [2] Jun Zhang, Feiting Han, Yutong Chung, Wang Chen. "A Novel Detection Framework About Conditions of Wearing Face Mask for Helping Control the Spread of COVID-19". IEEE Access. Vol:09. DOI:10.1109/ACCESS.2021.3066538. Pages: 42975-42984. 17th March 2021. [3] Bingshu Wang, Yong Zhao, C. L. Philip Chen. "Hybrid Transfer Learning and Broad Learning System for Wearing Mask Detection in the COVID-19 Era". IEEE Transactions on Instrumentation and Measurement. Vol:70. DOI: 10.1109/TIM.2021.3069844. Pages: 5009612. 30th March 2021. [4] Shilpa Sethi, Mamta Kathuria and Trilok Kaushik. “Face mask detection using Deep Learning: An approach to reduce the risk of coronavirus spread”. Elsevier. Vol:120. DOI: 10.1016/j.jbi.2021.103848. Pages: 103848. 24th June 2021. [5]
Isunuri B Venkateswarlu, Jagadeesh Kakarla, Shree Prakash. "Face mask detection using MobileNet and Global Pooling Block". IEEE Conference on Information & Communication Technology. INSPEC Accession Number: 20344646. DOI:10.1109/CICT51604.2020.9312083. 8th January 2021.
[6]
Toshanlal Meaanpal, Ashutosh Balakrishnan, Amit Verma. “Facial Mask Detection using Semantic Segmentation”. International Conference on Computing, Communications and Security (ICCCS). INSPEC Accession Number: 19113505. DOI: 10.1109/CCCS.2019.8888092. 31st October 2019.
[7] Xinqi Fan, Mingjie Jiang. “RetinaFaceMask: A Single Stage Face Mask Detector for Assisting Control of the COVID-19 Pandemic”. IEEE International Conference on Systems, Man, and Cybernetics (SMC). DOI: 10.1109/SMC52423.2021.9659271. 6th January 2022. [8] Xinqi Fan, Mingjie Jiang, Hong Yan. "A Deep Learning based Light-Weight Face Mask Detector with Residual Context Attention and Guassian Heatmap to Fight against COVID-19". IEEE Access. Vol:9. DOI: 10.1109/ACCESS.2021.3095191. Pages: 96964-96974. 6th July 2021.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 103-110 doi:10.4028/p-oswg04 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-13 Accepted: 2022-09-16 Online: 2023-02-27
Sign Language Detection Using Action Recognition Nishan Dutta1,a, Indumathy M.1,b Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani, Chennai, India Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani, Chennai, India [email protected], [email protected]
a
Keywords-Video detection, action recognition, sign language, gesture recognition, deep learning, frame, key holes, standard language
Abstract -Sign language detection technique is a part of technology which is of extreme importance to the society. Sign languages is used by deaf and dumb people who are unable to communicate directly using sound since they lack the ability to produce or recognize sound waves which enable us to communicate easily. The proposed project aims in decreasing the distance between the sign language detection techniques which only focuses on detecting the meaning of letters like ASL and not actions provided by the users. The project detects sign languages by using key holes as the position locator and then trains the system to detect accordingly. Keyholes are used to find the position of gesture to use LSTM throughout coaching of the information. Experimental results demonstrate the efficaciousness of the planned methodology in sign language detection task I. Introduction Sign language are used to communicate various expression using visual that is sign language is a vision based language which use visual prowess to determine the language indicates using hand shapes and hand gestures, movement of ones body the expression on the face and orientation, facial expression and lip movement. As we have different language for different regions sign language also various from one region to another region and the local sign language of each places are followed at their own places. To standardized the differences of various types of sign language gestures formed at different region each country almost has their own sign language standard like Indian Linguistic Communication, Portugal Linguistic Communication, American Sign Language etc. Sign language can be divided into three categories orthography where the fingers denotes the alphabets, the other type is where each word is expresses by a gesture, and the last one is gestures to convey a purposeful sentence Gloves and motion trackers square measure the foremost ordinarily used sensors for sensor-based SLR. Usually, measurements of these sensors square measure correct enough therefore no feature extraction techniques square measure required. It permits a scientist to specialize in the recognition downside. the disadvantage of victimization those sensors is that the signer is needed to wear those sensors, which can be annoying. therefore libraries like mediapipe is used that helps detects the situation of the key points of the hands while not the employment of gloves or the other sensors and therefore the project become price effective We have organized the rest of the paper as follows: Section I as introduction, Section II as related work, Section III as methodology, Section IV as model explanation, Section V as dataset, Section VI conclusion, Section VII as limitation and future scope and Section VIII for references.
104
IoT, Cloud and Data Science
II. Related Work Anup used the statistical approach of recognition of gestures from ISL. As a feature classification they went with the idea of histogram direction. The database was manually created as video database for better accuracy and training purposes.K-nearest approach and euclidean distance were used for training purposes. Kumud and Neha proposed feature extraction as a way for recognizing gestures out of a video. The video containing multiple gestures of Indian language. The video was differentiated using key frames. Extraction of video was used by the concept of gradient to separate the gesture from the multiple gestures. Principal component analysis and histogram was used for extracting the featured gestures. Euclidean distance, correlation along with manhattan was bought into use for classification and better accuracy. Joyeeta and Karen used eigen value weighted euclidean distance in which they used the concept of eigen vector in the pre processing stage where histogram matching and skin filtering techniques were brought into use during preprocessing. Ronchetti used probsom for classification of handshaped for detection for image based extraction process. ProbSom is self observing supervised map. These technique allowed them a precision of 90% S. Masood projected on the utilization of CNN Convolutional Neural Networks for the aim of character recognition to detect ASL (American Sign Language). In these project, the CNN favored model was positioned an overall accuracy of 96 percent on a picture dataset of 2524 American sign language gestures Lionel used GPU ( Graphic Processing Unit) to accelerate the CNN (Convolutional Neural Network) and Microsoft Kinect. The system is proposed to detect the Italian Sign Language. Using dataset of 20 Italian Gestures they have an accuracy of 92% Rajat used a fine transportable device with proper tuning as a solution to remove the primary drawback of reducing the gap of communication among specially abled individual and traditional folk. The device with the architectural design and the operations were mentioned into three main embedded category algorithms that aimed for quick, efficient, and easy communication III.
Methodology
In the proposed method, spatial options extracted individual frames for the origination model (CNN) Convolutional Neural Networks and temporal options RNN. every video was then dealineated using prediction of predicting sequences created by Convolutional Neural Networks CNN nurtures for the use of every individual specific frames. These specific individual frames are used as input to the RNN training sector. The video cherish every gesture, each frames were extracted from the background body components of the video of each specific frame except the hand. The hand were separated from the frames to convert into a grayscale image of hands that discard color specific training of the data and learning of the model. Frames of the training and coaching video got to the Convolutional Neural Networks model for coaching on the spatial options as CNN is use for spatial coaching options. The obtained trained model are then tested for accustomed build and to store the predicted predictions for the tested frames of the coaching to check knowledge. The predictions by the CNN utilizes the frames of the coaching and training knowledge. These knowledge were then given to the training module for research purpose using LSTM RNN model for training on the purpose of temporal options. When the training of RNN models are accomplished by the system the training data like the prediction of the frames the knowledge are then sent to testing set for testing purpose.
Advances in Science and Technology Vol. 124
A.
105
Train Convolutional Neural Networks for Prediction and Spatial Features
The convolutional neural network uses native spatial options from a frame of a video or a picture and then sums up the spatial options for higher order options. The figure 1 depicts a simplified diagram to help understand the formation of supply between higher order area unit and spatial unit using Convolution Neural Network. Inside of the functionality of the first convoluted neural network layer, when image is used as an input, feature maps for the image are created using the various convoluted multiple layer functionalities with CNN : for every multilayer filter ● Input images are convulatedwith filters using CNN ● Feature map is used to draw out a specific bias term to every addition of convolution added into it ● solution area unit well-versed a nonlinear activation perform ● feature maps are used to places on the area unit for the output CNN is provided with functionality to search the non linear separation or noise the distortion between the images completely due to its ability of convolution sums for non linear operation. The featured map made using the primary Convolutional Neural Network layers consists of extracted abstraction options. The next level CNN layer are tasked with conversion of the extracted abstraction options using aggregation and pooling of options in each feature map into higher order options combining the easy and average layer portions. The pooling aggregation and convolution steps area unit perennial till a series of little feature maps area unit made, counting on the design of the Convolutional Neural Network. The final layer of featured map is then traversed through connected Neural Network then finally it passes through a multiclass classifier. The primary role of convolution neural network is associated with the beginning model. From the training data for using it as dataset, for every gesture “X”, all 30 frames from the video which are compared to the gestures are labeled as “X” and aim for the beginning model for the training purposes.
Figure 1: Saving prediction using CNN training The model trained using CNN are used to provide predictions for every individual frames of videos for the important task of both the training and testing set. That leaves us two sets of predictions: 1. One responsible for the frames of training dataset for training using RNN 2. The second one is responsible for frames of testing dataset for testing using RNN Each and every video used in for testing and training purposed is separated into a sequence of 30 frames. Then after completion of training frames using CNN and making predictions for each frames, the video can be be seen as a sequence of predictions. B.
Training RNN (Temporal Features) and Testing
Recurrent neural networks (RNNs) are capable of learning options and long run dependencies from sequent and time-series information. The Recurrent neural network consists of stack of non-linear units wherever a minimum of one association formed between units it creates a directed cycle. An
106
IoT, Cloud and Data Science
exceptionally good trained Recurrent neural network will model any high-power system but, training Recurrent neural networks are commonly affected by problems in learning long-type dependencies. During this paper, we have a tendency to gift a survey on RNNs and several other new advances for newcomers and professionals within the field. The basics and up to date advances ar explained and therefore the analysis challenges ar introduced. RNNs represent the temporal options of knowledge exploitation time-variant hidden states whose transition is decided by the previous state Associate in feeding an input at this time. RNNs are generally trained by gradient descent strategies exploitation back- propagation through time. Recurrent neural network categorized under deep learning algorithmic rule that uses the gained output feeding into as input to supply for the algorithm to run and temporal memory. Method dynamic sequences are made possible due to the existence of the temporal memory. The aforementioned features allows Recurrent neural network and permits the use of alternative machine learning algorithms for better sequence prediction algorithm in applications like speech recognition for increased accuracy and ease of training dataset. A known algorithm of Recurrent neural network is long memory. LSTM has the capability to produce better results than the general RNNs because of the availability of a function named error backpropagation. This functionality has the ability to eliminate the error removing and also permits the use of exploding phenomenon and permits LSTM to predict the error and allow to train efficiently than general RNN. The trained videos obtained from the CNN for every frames of predicted gesture are used an input into RNN as sequence of frame prediction for the constituent frames. The RNN trains for learning to predict every gesture of a frame as a sequence of predictions. A model file is formed after the completion of training of RNN gesture video sequencing.
Figure 2: Input for training RNN from CNN results The predictions obtained by training the Convolution Neural Network for checking of frames of the set are designated as input data for trained model for testing. C.
LSTM
LSTM was mainly introduced to solve the problems that are failed to do with RNNs. This LSTM recurrent unit basically tries to remember all the knowledge it has gained in the past filtering out the irrelevant data with the help of activation function.Each of it unit fundamentally tries to maintain a vector that usually contains the information the previous unit has retained to. This method works on both single and sequential knowledge points, that is on images and videos both. For example, we can use LSTM for intrusion detection systems, recognizing handwriting and speech, detecting network traffic and on non segmental tasks. A general Long STM is formed using the combination of input gate, cell, forget gate and an output gate. The functionality of the four components differ from each other. The inner, output and forget
Advances in Science and Technology Vol. 124
107
gate are used for regulation of flow of the information in and out of the cell whereas the arbitrary time is being checked by the cell. Features like processing, classifying and creation of prediction are best features of the LSTM functionality in correspondence with time series data due to the possibility of delay between between events of higher importance in a time series. The traditional problem encounter by the RNN also known as the vanishing gradient problem is a well suited problem for Long STM as it is developed with the functionality of solving these type of problem. RNN have various disadvantages sompared to Long STM for example relative insensitivity of length of gap. Long STM also provides various application for sequence learning over RNN for example hidden markov model. Sign Language Translation also known as SLT has been quite a difficult task as the conflict we face between the gesture variations and the word we speak, irregardless of the fact that gestures provided are in sequential manner. So, two solve tis challenging issue two models namely HMM and CTC have been developed. But, these two models too failed to sentence the visual contents with word alignment. Thus to solve this issue, this paper proposed a hierarchical LSTM referred as HLSTM with encoder and decoder to solve word and visually gesture contents alignment. This model tackles different kinds of spatio-temporal transitions between clips or frames. So, first these spatio temporal prompts the video frames from 3D CNN and then wraps them together with components of speech by clipping them with frames of adaptive length. After the top layer of our model HLSTM is built by pooling recurrent output, we have proposed a mechanism called temporal attention-aware weighting to balance the relationship between visual and word sentences. And the last two layers of LSTM are constructed to individually loop between the translational semantic words and components of speech. The encoder time steps for the last two LSTM layers have been decreased by conserving the primary visual content by 3D CNN and top layer of HLSTM model that could attain more non linearity by less computational complexity. We have compared our model on the singer independent test with both seen and unseen sentences and it outperforms both. IV.
Model Explanation
For detecting the sign language and then concatenating the detected words the project have two parts. The first part that is detecting the sign language comes with various modules first module being detecting the gestures of the hands frame by frame using mediapipe. The location of assessing each keypoint is important as it determines the training co-ordinate of the system. The detection comes with training and testing using the data obtained by the gesture position detection. The training is done using LSTM due to the the huge amount of data calculation. The next part of the project is to concatenate the detected action. In these module the system divides the actions detected using frames and concatenate the detected action as per the frame change. These parts conclude the overall functionality of the project V. Dataset Dataset and Performance These is a self made data set made in conquest with argentinian sign language. The sign language was self made to train the system efficiently and 30 frames per photo from video was obtained for higher data precision.
108
IoT, Cloud and Data Science
Figure 3: 30 folder containing 30 frames for dataset The model of the training is done using LSTM as depicted in figure 4
Figure 4: LSTM Model The accuracy of the performance is more than 94 percentage as depicted in figure 5
Figure 5: Accuracy of system VI.
Conclusion
The proposed project has been successfully trained into detecting sign language using action recognition with the help of hand gestures. The project also serves it purpose of connecting one detected words after the other detected words using concatenating. Hopefully, this module can be an advantage for new age sign language detection using action recognition as it does not only concentrated on the detection of the language with high efficiency but it also helps in concatenating the words which can be useful in transforming the whole sign language detection technology into a new age.
Advances in Science and Technology Vol. 124
VII.
109
Limitation and Future Scope
The biggest limitation of the project is the inability of the algorithm used to make the system is that the proposed project connected the detected word but it was unable to make proper sentence out of it and overcoming these obstacles is the key to future development or scope of these project. The future scopes of the project would like to include the project with algorithm that can form a sentence from the concatenated word detected out of the system. Including these change will make the project into a more interactive system as the person in need can make continuous sign language and the system will detect them and convert them into meaningful sentences which will ease the process of understanding and will serve the purpose of the project. References [1] Sarfaraz Masood, Adhyan Srivastava, Harish Chandra Thuwal and Musheer Ahmad "Real-Time Sign Language Gesture Recognition from Video Sequences Using CNN and RNN ", V. Bhateja et al. (eds.), Intelligent Engineering Informatics, Advances in Intelligent Systems and Computing 695, Springer Nature Singapore Pte Ltd. 2018 [2] Muneer Al-Hammadi, Ghulam Muhammad Wadood Abdul, ,Mansour Alsulaiman, Mohamed A. Bencherif, And Mohamed Amine Mekhtiche, "Hand Gesture Recognition for Sign Language Using 3DCNN", IEEE Access May 12,2020 [3] Mohamed Hassan, Khaled Assaleh, Tamer Shanableh, “User-dependent Sign Language Recognition Using Motion Detection“, International Conference on Computational Science and Computational Intelligence 2016 [4] Dyah Rahma Kartika, Riyanto Sigit, S.T, M.Kom., Ph.D, Setiawardhana, S.T., M.T, " SIGN LANGUAGE INTERPRETER HAND USING OPTICAL-FLOW", International Seminar on Application for Technology of Information and Communication, 2016 [5] Soma Shrenika, Myneni Madhu Bala, "Sign Language Recognition Using Template Matching Technique" IEEE Xplore, July 2020 [6] Rajarshi Bhadra, Shubajit Kar, “ Sign Language Detection from Hand Gesture Images using Deep Multi-layered Convolution Neural Network”, IEEE Second International Conference on Control, Measurement and Instrumentation 2021 [7] Trong Nguyen Nguyen, Huu Hung Huynh, "Static Hand Gesture Recognition Using Artificial Neural Network", Journal Of Image And Graphics, Volume 1, No.1, March, 2013 [8] T. Shanableh, K. Assaleh, and M. Al-Rousan, “Spatio-temporal feature extraction techniques for isolated gesture recognition in arabic sign language,” IEEE Transactions vol. 37, no. 3, June 2007 [9] W. Kong, S. Ranganath, “Towards subject independent continuous sign language recognition: A segment and merge approach” Pattern Recognition, vol. 47, no. 3, pp 1294 – 1308, Science Direct 2014 [10] P. Viola and M. Jones, ‘‘Rapid object detection using a boosted cascade of simple features ’’, IEEE Computational Social Conference Computational Visual Pattern Recognition (CVPR), Jun. 2001 [11] M. R. Abid, E. M. Petriu, E. Amjadian, ‘‘Dynamic sign language recognition for smart home interactive application using stochastic linear formal grammar,’’ IEEE Transaction, vol. 64, no. 3, Mar. 2015 [12] Ronchetti, F., Quiroga, F., Estrebou, C.A., Lanzarini, "Handshape recognition for Argentinian sign language using probsom", Journal Computer Science Technology 16, 2016
110
IoT, Cloud and Data Science
[13] Tripathi, K., Nandi," Continuous Indian Sign Language gesture recognition and sentence formation", Procedia Computer Science 54, 523–531, 2015 [14] Pigou, L., Dieleman, S., Kindermans, P.-J., Schrauwen," Sign language recognition using convolutional neural networks", Workshop at the European Conference on Computer Vision 2014, Springer International Publishing, 2014 [15] Sharma, R., Bhateja, V., Satapathy, S.C., Gupta, " Communication device for differently abled people", International Conference on Data Engineering and Communication Technology, pp. 565– 575. Springer, Singapore, 2017
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 111-124 doi:10.4028/p-52096g © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-22 Accepted: 2022-09-23 Online: 2023-02-27
Brain Tumor Segmentation Using Modified Double U-Net Architecture Thejus Shaji1,a, Ravi K2,b, Vignesh E3,c and A Sinduja4,d Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus,No.1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India
1
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus, No.1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India
2
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus, No.1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India
3
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus, No.1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India
4
[email protected], [email protected], [email protected] , d [email protected]
a
Keywords: Deep learning; MRI; Unet; Medical image segmentation; brain tumor;
Abstract: Children and the elderly are most susceptible to brain tumors. It's deadly cancer caused by uncontrollable brain cell proliferation inside the skull. The heterogeneity of tumor cells makes classification extremely difficult. Image segmentation has been revolutionized because of the Convolution Neural Network (CNN), which is especially useful for medical images. Not only does the U-Net succeed in segmenting a wide range of medical pictures in general, but also in some particularly difficult instances. However, we uncovered severe problems in the standard models that have been used for medical image segmentation. As a result, we applied modification and created an efficient U-net-based deep learning architecture, which was examined on the Brain Tumor dataset from the Kaggle repository, which consists of over 1500 images of brain tumors together with their ground truth. After comparing our model to comparable cutting-edge approaches, we determined that our design resulted in at least a 10% improvement, showing that it generates more efficient, better, and robust results. Introduction The aberrant cell development in the human brain causes brain tumors. Malignant brain tumors are growing more prevalent, posing a tremendous burden on individuals and society. A brain tumor is segmented to diagnose this sickness utilizing high-quality image processing. The objectives of medical image analysis are to give radiologists and doctors an efficient diagnostic and treatment process[1]. To assess a patient's condition, doctors utilize magnetic resonance imaging (MRI) data, in which various symptoms and classifications of brain tumors have various appearances[3] and other medical imagery. As a result, research in medical image processing has become a popular area in computer vision. Medical imaging methods such as CT scans, X-rays, and MRIs can provide nondestructive information about illness, abnormalities, and anatomical structures within the human body. It is vital to analyze medical photos[2] and extract meaningful information because they include a large quantity of data and are vulnerable to noise interference. Computer-assisted image classification eliminates subjectivity and saves time, making it useful in clinical diagnosis[4][5]. Image segmentation is a crucial and challenging aspect of image processing. In the realm of image interpretation, it has become a hotspot. This is also a constraint that prevents 3D reconstruction and other technologies from being widely used. Segmentation is the process of dividing a picture into segments with comparable characteristics. To put it another way, it's the process of isolating the target in a picture from the backdrop. The speed and accuracy of image segmentation algorithms are currently increasing. We are developing a comprehensive segmentation algorithm for a variety of pictures by merging several unique concepts and technology [6].
112
IoT, Cloud and Data Science
The most broadly used deep learning algorithms in medical picture segmentation are Fully Convolutional Network (FCN) [7] and U-Net [8]. U-net has been demonstrated to be the most efficient technique in terms of performance. The U-net design is symmetrical in nature, with encoder operations being done on the left half and decoder functions being performed on the right. In this research, we attempt to enhance the existing medical image segmentation method while keeping in mind the limits of the baseline model using a cutting-edge segmentation technique that uses a modified version of the double U-net model to accurately anticipate and segment the tumor region. We will be comparing it with some of the existing algorithms. A. Related Work A scan is taken by a specialized machine in the same way that a photograph is taken by a digital camera. Using computer technology, a scan builds an image of the brain by scanning it from various angles. A contrast agent (or contrast dye) is used in some types of scans to assist the doctor to distinguish between normal and diseased brain regions. MRI (Magnetic Resonance Imaging) is a scanning technology that uses a magnetic field and a computer to make images of the brain on film. It allows doctors to generate a three-dimensional representation of the tumor by providing images from several planes. The MRI identifies signals from both normal and pathological tissues, allowing for clear imaging of nearly all tumors. CT or CAT Scan (Computed Tomography): It uses a combination of advanced x-ray and computer technologies. Soft tissues, bones, and blood arteries may all be seen on a CT scan. CT scans can reveal a certain kind of tumor. It can also detect swelling, bleeding, and calcification of bone and tissue. During a CT scan, the contrast agent is usually iodine. Long et al. [9] were the first to create fully convolutional networks (FCN), whereas Ronneberger et al. [10] introduced UNet. Skip connection is the one thing they have in common. In FCN, up-sampled feature maps are introduced to replace feature maps that the encoder missed; in U-Net, they are concatenated, with convolutions and non-linearities added between each stage of up-sampling. Because skip connections efficiently restore the network output's full spatial resolution, fully convolutional techniques are excellent for semantic segmentation. Chen et al. [11] proposed DeepLab as a method for segmentation. DeeplabV3 has since been demonstrated to be much better than prior DeepLab versions that lacked DenseCRF post-processing. In comparison to FCN & U-Net, the DeepLabV3 architecture adopts a synthesis technique with fewer convolutional layers. Similar to the U-Net approach, DeepLabV3 makes use of skip connections between the analysis and synthesis pathways. For complicated scene interpretation, Zhao et al. [12] suggested an effective scenes parsing network, in which global pyramidal characteristics give a chance to gather more contextual data. Deep Residual U-Net was developed by Zhang et al. [13], it leverages residual connections to improve the output segmentation map. For medical image segmentation, The Dense-Res-Inception Net (DRINET) was proposed by Chen et al. [14] and is compared with FCN, U-Net, and ResUNet for segmentation of medical images. Ibtehaz et al. [15] enhanced the UNet design and suggested a more efficient MultiResUNet design. They tested their findings against U-Net on numerous segmentation datasets for medical images and discovered that their results were more accurate than U- Net's. B. Motivation The study looked at computer-assisted approaches for processing and displaying magnetic resonance (MR) images. Many studies have focused on identifying and assessing brain abnormalities. Using MR scans of the head to automatically identify abnormalities in the brain is a key step in this
Advances in Science and Technology Vol. 124
113
procedure. Due to flaws in MRI scanners, MR pictures contain undesired intensity fluctuations. The accuracy of automated analysis can be improved by removing or decreasing these variances. There are a number of complicated images in medical imaging that are usually neglected during medical examinations and if not found early, can progress to cancer. As a result, to deal with it, a more precise medical image segmentation approach is necessary. This motivated us to propose a model that produces efficient and improved segmentation. Skip connection is a feature of U-Net. The skip connection enables intense feature mappings from the analysis path to be sent to the equivalent layers in the synthesis component. This method is used to convey spatial data to the deeper layer, resulting in a substantially more precise output segmentation map. The second issue we encountered was poor feature map extraction, which we addressed by concatenating two outputs from two U-nets, resulting in a better feature map. ASPP was also employed as an enhancer since it assists in the extraction of high-resolution feature maps, which leads to better performance. Separating touching items of the same class is another difficulty we encountered. This is tackled via the use of a weighted loss function that disincentivizes the model for failing to distinguish the two items. The proposed model makes use of a weighted loss function. C. Outline of the proposed work
Figure 1: A synopsis of the proposed work The planned experimental setup is depicted in full in Figure 1. The inputs are sent to a data preprocessing section, where the photos are scaled, normalized, and turned into an array before being sent to a data splitting component, where the data is separated into test and train, and afterward into test and validation. The preprocessed training data is fed into the model, which is then trained, and the results, as well as the performance graph, are projected using data visualization in the result generation segment. Model
Table 1: Models Used Parameters
U-net
7,787,633
U-net with vgg19
31,172,033
Double U-net
16,534,530
Modified Double Unet (proposed)
4,489,386
114
IoT, Cloud and Data Science
Table 1 includes all the models that are being used in this experiment and the number of parameters in the model. D. Organization of the paper The structure of the paper is summarized below. The suggested segmentation method is compared to the existing segmentation algorithm in Section 2. Section 3 discusses the experimental findings and performance analyses. The paper comes to a conclusion with Section 4. Proposed Methodology In this work, we present a deep learning approach for segmenting brain tumors. We used data from a publicly available dataset [kaggle repository] for the objective of segmenting brain tumors using MRI scans. The data is being used for segmentation and is accomplished through the use of a modified Double U-net design. The data passes through a number of phases, as seen in the diagram below. A. Data Pre-Processing When it comes to data mining techniques, data preprocessing is the act of transforming raw data into a format that can be accessed and used. A number of flaws may be found in real-world data, which is frequently inadequate, inconsistent, and/or deficient in specific behaviors or patterns, among other things. Preprocessing data is a tried-and-true way to solve such issues. B. Data splitting The practice of partitioning data into training and testing groups is a component of learning data mining methods and practices. If a data set is split into training and test datasets, the training dataset obtains the information and data while the testing group receives a smaller portion. To ensure that the training and testing sets are comparable, Analysis Services obtains a random sample of the data. The effect of data inconsistencies is reduced and a greater grasp of the model's properties is acquired by using similar datasets for training and testing. After processing the training set, the model is evaluated by generating predictions against the test set. C. Convolution Neural Network Because of its capability to process large amounts of data, deep learning has become a particularly effective technology in recent decades. In the field of pattern recognition, hidden layers have surpassed conventional techniques in popularity. Convolutional Neural Networks (CNNs) is a form of Deep Learning approach which can accept images as an input, allocate priority to different aspects in the picture (learnable weights and biases), and differentiate among them [16]. A CNN requires substantially less pre-processing than conventional classification methods. Since fundamental techniques necessitate the hand-engineering of filters, CNNs could understand these filters/characteristics with just enough training. It’s also a deep learning-based image processing AI system capable of performing both generative and analytical tasks such as multimedia content identification, recommendation systems, and natural language processing (NLP). A CNN is comparable to a multilayer perceptron since it also employs the same approach, but has been adjusted for minimal processing needs. As demonstrated in Figure 2, CNN's have numerous convolutional layers, pooling layers, fully connected layers, and normalizing layers, as well as an input layer, an output layer, and a hidden layer. The removal of limitations and improvements in image processing efficiency results in image analysis and natural language processing technology which is considerably more efficient and simple to train.
Advances in Science and Technology Vol. 124
115
Figure 2: Sample for Convolutional Neural Network D. Max Pooling Max pooling is a technique for creating a downsampled (pooled) feature map that estimates the maximum value for feature map patches. It introduces a little amount of translation invariance, meaning that minor changes in the image have little influence on the values of most pooled outputs. Max Pooling can also be used as a Noise Depressant. It eliminates every noisy activity while also denoising and shrinking its image's dimensions. Figure 3 illustrates an explanation of the above.
Figure 3: Max Pooling E. Upsampling The Upsampling layer is a basic layer with no weights that may be used in a generative model after a typical convolutional layer to twice the dimensions of input. Upsampling in CNN may be unfamiliar territory for those familiar with classification and object recognition architecture, but the concept is straightforward. We increase the feature dimensions in order to restore the compressed feature map to its initial dimensions in the input image. Transposed convolution, up convolution, and deconvolution are all terms used to describe upsampling. Upsampling may be done in a variety of ways, from the simplest to the most complicated, including Nearest Neighbor, Bilinear Interpolation, and Transposed Convolution. An example of the stated is shown in figure 4.
Figure 4: Upsampling F. Activation Function i. ReLU Deep learning models, the Rectified Linear Unit (RELU) is the generally utilized activation function. The method returns 0 when given a negative input; nevertheless, when given a positive input, the function returns Xx [17]. Figure 5 depicts the Rectified Linear Activation Line Plot.\
116
IoT, Cloud and Data Science
f(x)=max(0,x)f(x)=max(0,x)f(x)=max(0,x)f(x)=max(0,x)f(x)=max(0,x (0,x))
- (1)
Figure 5: ReLU function ii. Sigmoid It is a mathematical expression shaped just like the character "S" which can convert any real value toward a range between 0 to 1. The logistic function is also called the sigmoid function [18]. 1 / 1+e -z Y = 1 / 1+e -z Y = 1 / 1+e
- (2)
As a result, as z tends to infinity, the anticipated y value is 1; as z tends to infinity, the projected y value is 0. If somehow the sigmoid function gives a value greater than 0.5, then labeling is positive class 1, and when it returns a value less than 0.5, the label is negative class 1 or labeled as class 0. The Sigmoid function's line plot is seen in Figure 6.
Figure 6: Sigmoid Function G. Optimizers i. Adam Optimizer Adam optimization is a gradient descent method that uses reactive first- and second-order moment prediction. ii. Adamax Optimizer Adamax is an Adam variant that uses the infinite norm. The default settings are the same as those in the article. Adamax can be better than adam in some cases, notably in models containing embeddings. ii. Nadam Optimizer The Nesterov-accelerated Adaptive Moment Estimation (NAdam) method combines the Adam optimization strategy with Nesterov momentum, which is a better type of momentum.
Advances in Science and Technology Vol. 124
117
ii. RMSprop Optimizer The RMSprop optimizer is a gradient descent technique that is based on momentum. In the vertical plane, the RMSprop optimizer reduces oscillations. As a consequence, we might well be able to boost the training rate, which will allow the system to perform more horizontal leaps and conclude faster. H. ResNet ResNet, or residual neural network, is a kind of ANN model. To hop across some levels, ResNet employs skip connections or shortcuts [19] as demonstrated in Figure 7. Most of ResNet algorithms include double- or triple-layer skips with nonlinear systems (ReLU) with batch normalization. ResNet allows you to train hundreds or even thousands of layers at once and still get outstanding results. Many computer vision applications apart from picture classification, such as object identification and face recognition, have seen improved performance as a result of their powerful representational ability. Skip connection significantly optimizes the networks in the primary phases of learning by deploying fewer layers. As there are lesser layers to transmit across, the impact of fading gradients is reduced, which accelerates training. As the system understands the feature space, it eventually retrieves the skipped stages. The system stays closer to the pattern and learns quicker when all layers are extended towards the conclusion of the training.
Figure 7: Canonical form of a residual neural network. I. Atrous Spatial Pyramid Pooling (ASPP) Before convolution, the semantic segmentation module Atrous Spatial Pyramid Pooling (ASSP) [20] resamples a given feature layer at different rates. As shown in figure 8, this includes examining the original image with a variety of filters with complementary effective fields of view and gathering both items and relevant visual contexts of varied sizes. The mapping is performed through a succession of concurrent atrous convolutional layers with variable sampling rates, rather than resampling features.
Figure 8: ASPP
118
IoT, Cloud and Data Science
J. Modified Double U-net Figure 9 depicts a diagram of the proposed model architecture. As shown in the diagram, The input will be sent to an encoder block, which in this case is a Resnet. The output will then pass through an ASPP layer before being sent to the Decoder block, which completes the first U-net. A Multiply layer would be used to blend the achieved output with the initial input. The blended input would then be sent to another ASPP layer prior to the second U-net layer. The input is received by an encoder block containing convolutional layers, which processes it and sends it to the decoder layer, where the outputs from the first Unet's Resnet block are combined before processing. The output from the second U-net would be merged with the first U-net layer using a concatenation block that yields the final output. Table 2 lists the numbers of each layer's filters.
Figure 9: Modified Double U-Net Architecture
Advances in Science and Technology Vol. 124
Table 2 : The Double U-Net execution flow
119
120
IoT, Cloud and Data Science
Experimental Results A. Dataset The dataset contains a total of 1979 brain MR images, as well as an equivalent number of manual FLAIR anomaly classification masks. The data were gathered from the website of The Cancer Imaging Archive. They are linked to 110 cases within TCGA low-grade glioma dataset that has at least FLAIR sequencing and genomic grouping results. Figure 10 shows several examples of images from the described dataset.
Figure 10: Sample Dataset Images B. Performance metrics i. Epoch In machine learning, an epoch is the count of times the machine learning model has made through the entire training data. Batches are a common technique to organize data, particularly when there is a lot of it. Some people use the term "iteration" in a figurative sense, referring to the process of passing one batch through the model as one iteration. ii. Intersection over Union (IoU) The IoU or Intersection over Union, usually known as the Jaccard index, is a technique for determining the relative intersection between the target mask and our expected output. This statistic is equivalent to the Dice coefficient, which is a standard training loss function. It can be calculated by,
iii. IoU coefficient
IoU = target ∩ prediction/target ∪ prediction
The IoU coefficient quantifies how similar and diverse the finite sample sets are. It's determined by dividing the intersection size by the union size of the sample sets.
Advances in Science and Technology Vol. 124
121
iv. IoU Loss The intersection-over-union (IoU) loss is regularly utilized within the assessment of division quality since it has the more noteworthy perceptual quality and scale invariance, which loans reasonable significance to littler objects when compared to per-pixel misfortunes. C. Performance analysis i. Result Evaluation of Proposed Model on different data split Table 3 represents the performance of the proposed model with varying data split. We have recorded the IoU scores of the proposed modified double U-net model on 80-20, 70-30, 60-40, and 50-50 data split. Table 3: Performance of the proposed modified double Unet model for different data split Algorithm IoU Loss
IoU Coef
Epochs
IoU_Score
80/20
-0.5008
0.5005
20
50.052
70/30
-0.4611
0.4611
20
46.108
60/40
-0.4315
0.4274
20
42.740
50/50
-0.2628
0.2630
20
26.300
From the experiment results in table 3, it is clear that 80/20 has achieved the highest IoU score, the data split caused was 1584 images for the training set as compared to 1386,1188,900 for 70,60 and 50 percentage split respectively. As a result, it is obvious that as the size of the training dataset increases, so does the IoU score for this proposed model. ii. Proposed model with different ASPP filters The below table represents the performance of the proposed model with 80/20 data split for varying different filter sizes in ASPP. The IoU coefficients and loss have been recorded for the same. Table 4: Performance of the proposed modified double Unet model with different ASPP filters Filters
IoU Loss
IoU Coef
Epochs
IoU_Score
64
-0.5008
0.5005
20
50.052
128
-0.4424
0.4358
20
43.578
256
-0.3857
0.3799
20
37.991
512
-0.4352
0.4378
20
43.783
From the experiment where we noted the performance of the proposed model with different ASPP filters, it was clear that the one with 64 filters is the best as it produces the highest IoU score (table 4). Since a smaller Filter is more likely to have a better IoU score, a filter size of 64 delivers promising segmentation results. As a result, the feature layer information is significantly more descriptive. iii. Proposed model with various Optimizers A comparison study was conducted on the Proposed model with 80/20 data split and ASPP filter size of 64 for different Optimizers. The performance of the model has been recorded in the table below.
122
IoT, Cloud and Data Science
Table 5: Performance of the proposed modified double Unet model with various Optimizers Optimizers IoU Loss IoU Coef Epochs IoU_Score Adam
-0.5008
0.5005
20
50.052
Adamax
-0.529
0.5315
20
53.145
Nadam
-0.4401
0.4415
20
44.153
RMSprop
-0.4748
0.4774
20
47.741
Table 5 shows that Adamax is the best optimizer for the model since it gives a higher IoU score and outperforms other optimizers due to the presence of embeddings. iv. Proposed model with various conv2D blocks and filter size A comparison study was conducted on the Proposed model with 80/20 data split, ASPP filter size of 64, and adamax optimizer for different conv2D block and filter sizes. The performance of the model has been recorded in the table below. Table 6: Performance of the proposed modified double Unet model with various conv2D block and filter sizes No of Blocks
Filter size IoU Loss
IoU Coef
Epochs IoU_Sc ore
1
1,1
-0.4642
0.4573
20
45.728
1
2,2
-0.4734
0.4666
20
46.656
1 2
3,3 1,1
-0.4791 -0.4737
0.4791 0.4738
20 20
47.914 47.380
2
2,2
-0.4810
0.4833
20
48.329
2
3,3
-0.551
0.5566
20
55.657
3
1,1
-0.4734
0.4680
20
46.804
3
2,2
-0.4751
0.4770
20
47.695
3
3,3
-0.4974
0.4994
20
49.935
From the experiment where we noted the performance of the proposed model with different convolutional blocks and filter sizes, it was clear that the one with 2 blocks of conv2D with (3,3) filters is the best as it produces the highest IoU score (table 6).
Advances in Science and Technology Vol. 124
123
v. Performance evaluation of the proposed segmentation model with the existing algorithms
Figure 11: Performance evaluation of the proposed segmentation model with the existing algorithms A comparative study has been done with the same experimental setup with different models, from which it seems certain that our proposed modified double Unet model has outperformed other similar state of the art models. Conclusion Due to the general complexity of MRI brain imaging, brain tumor detection is a difficult problem to solve, but it aims to detect tumors by segmenting them utilizing AI-based algorithms. In this paper, we proposed a segmentation model using a Modified Double U-Net which has improved segmentation accuracy. The proposed model was tested using data from the Kaggle repository and was experimented with different variables of which the results are given in the performance analysis(3.3). Initially, the proposed model was tested with various data splits (shown in table 3). From which it was evident that the model outperformed for the 80/20 split. It was later experimented with different ASPP filters for which the data split was kept as 80/20 and was discovered that the filter size 64 performed better than the other filter sizes (table 4). The model was then tested for different optimizers and Adamax outperformed every other optimizer (table 5). Additionally, it was evaluated with various convolutional block and filter sizes (table 6 ) and it was determined that the optimal configuration is two blocks of conv2D with (3,3) filters. From the performance analysis (figure 11) it is clear that, in contrast to comparable cutting-edge approaches, our architecture generated more efficient, improved, and robust outcomes. This segmentation algorithm has the potential to be used in real-time medical imaging collections in the future. References [1] Gao, J., Jiang, Q., Zhou, B., & Chen, D. (2019). Convolutional neural networks for computeraided detection or diagnosis in medical image analysis: An overview. Mathematical Biosciences and Engineering, 16(6), 6536. [2] Song, H., Nguyen, A. D., Gong, M., & Lee, S. (2016). A review of computer vision methods for purpose on computer-aided diagnosis. J. Int. Soc. Simul. Surg, 3, 1-8. [3] Roy, S.; Bandyopadhyay, S.K. Detection and Quantification of Brain Tumor from MRI of Brain and Its Symmetric Analysis. Int. J. Inf. Commun. Technol. Res. 2012, 2, 477–483. [4] G. Cosma, D. Brown, M. Archer, M. Khan, and A. G. Pockley, ‘‘A survey on computational intelligence approaches for predictive modeling in prostate cancer,’’ Expert Syst. Appl., vol. 70, pp. 1–19, Mar. 2017. [5] M. A. Nogueira, P. H. Abreu, P. Martins, P. Machado, H. Duarte, and J. Santos, ‘‘Image descriptors in radiology images: A systematic review,’’ Artif. Intell. Rev., vol. 47, no. 4, pp. 531– 559, Apr. 2017.
124
IoT, Cloud and Data Science
[6] Lateef, F.; Ruichek, Y. Survey on semantic segmentation using deep learning techniques. Neurocomputing 2019, 338, 321–348 [7] Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [8] Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [9] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) [10] Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). [11] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017 50 [12] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 2881–2890. [13] Z. Zhang, Q. Liu, and Y. Wang, “Road extraction by deep residual u-net,” Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749–753, 2018. [14] L. Chen, P. Bentley, K. Mori, K. Misawa, M. Fujiwara, and D. Rueckert, “Drinet for medical image segmentation,” IEEE transactions on medical imaging, vol. 37, no. 11, pp. 2453–2462, 2018. [15] N. Ibtehaz and M. S. Rahman, “Multiresunet: Rethinking the u-net architecture for multimodal biomedical image segmentation,” Neural Networks, vol. 121, pp. 74–87, 2020. [16] Keiron O’Shea and Ryan Nash : An Introduction to Convolutional Neural Networks, 2 Dec 2015 [17] Abien Fred M. Agarap: Deep Learning using Rectified Linear Units (ReLU), 7 Feb 2019 [18] Sophie Langer, Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289 Darmstadt, Germany: Approximating smooth functions by deep neural networks with sigmoid activation function, October 8, 2018 [19] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun : Deep Residual Learning for Image Recognition, 10 Dec 2015 [20] - Liang-Chieh Chen, George Papandreou, Senior Member, IEEE, Iasonas Kokkinos, Member, IEEE, Kevin Murphy, and Alan L. Yuille, Fellow, IEEE: DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [21] Levin A, Lischinski D and Weiss Y 2008 A closed-form solution to natural image matting. IEEE Trans. Pattern Anal. Mach. Intell. 30(2): 228–242
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 125-136 doi:10.4028/p-tj6e43 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-04 Accepted: 2022-09-16 Online: 2023-02-27
Surveillance Image Super Resolution Using SR - Generative Adversarial Network N. Venkat Narayanan1,a, Arjun T.2,b, Logeshwari R.3,c Students1,2 Assistant Professor3 “Department of Computer Science and Engineering” SRM Institute of Science and Technology, Vadapalani, Chennai Email: [email protected] [email protected], [email protected]} Keywords. Image super resolution, SISR, Surveillance, Generative adversarial network, SRGAN, Peak signal to noise ratio, SSIM, and MSE
Abstract- A single-image-super-resolution (SISR) is the process of converting a single low-quality (LR) image to a high-quality (HR) image. This technology is utilised in a variety of industries, including medical and satellite imaging, to retrieve quality and required information from blurred or overexposed photos. Because of the lack of ability to extract important data and images due to poor quality surveillance photographs, this method can be utilised in the field of surveillance to produce high-quality images. We'd like to use General Adversarial Networks to handle low-quality photos because existing methods have resulted in slightly fuzzy and greasy images that look like oil paintings (GAN). We'd like to introduce Super Resolution General Adversarial Networks in particular (SRGAN). This method employs perceptual losses. In this case, PSNR, MSE, and SSIM values are shown to be superior to those obtained by standard approaches in this case. The SRGANprocessed photos are of excellent quality, allowing the images to be seen through hazy and misty areas. I.
Introduction
Machine learning is a branch of study that aims to teach computers how to act and think like humans. Thanks to machine learning, computers can learn without being explicitly programmed. It allows computers to learn new things. Image processing is a key part of machine learning that involves performing operations on a visual to obtain enhanced image output or relevant information. These strategies are being introduced to address the issue of image consumption in intelligence, space, and medical applications.The most difficult problem they confront is retrieving data from low-resolution photos. Image processing technologies can be used to extract meaningful By transforming low-resolution photos into high-resolution images, you can extract information from them. The approach of recovering high-definition images from low-definition photographs is widely used for image super-resolution (SR). It is a crucial component of image processing techniques. Medical imaging, satellite imaging, and surveillance imaging are just a few of the realworld applications of this technology. Deep learning techniques are commonly used to create superresolution models. SISR is a technique for converting a low-resolution surveillance image into a high-resolution image. II.
Existing System
Interpolation methods, such as nearest neighbor, bilinear, and bi-cubic interpolation, are used in traditional picture super-resolution models. The nearest neighbor chooses the closest pixel for each point to be interpolated, ignoring all other pixels. Bilinear interpolation is a quadratic interpolation with a 2x2 receptive field size that performs the interpolation of one axis of the picture before moving to another axis. Bicubic interpolation is similar to bilinear interpolation, however, it operates with 4x4 pixels instead of 2x2 pixels, resulting in a smoother image. Computational
126
IoT, Cloud and Data Science
complexity, noise amplification, blurred outcomes, and information loss were all issues with interpolation algorithms. This resulted in super-resolution models employing Convolutional Neural Network and Resnet designs, but it also resulted in quadratic complexity growth, a long run time, and performance degradation. III.
Related Work
Depth picture super resolution is a major but difficult problem in this fundamental paper[1]. To elucidate this problem, they developed a ground-breaking deep color-guided coarse-to-fine CNN system[15].to begin, they used a data-driven filter techniques rather than hand-designed filters to estimate the ideal filter deep picture super –resolution. For upsampling depth, the filter trained from Vast datasets is more precise and reliable. Second, “Coarse-to-fine CNN” was used to acquire diverse filter kernel dimensions. The CNN learns larger filter kernels in the coarse stage to yield a basic high-visibility depth image. The coarse high-visibility depth image is fed into the fine stage[16], which allows smaller filter kernels to be learned for more accurate results. This network may be able to gradually retrieve high-frequency properties. Finally, they created a colour guidance method for depth picture up sampling that combines colour difference and spatial distance. The interpolated high-resolution depth image is then altered using pixels from high-resolution colour maps[9]. The deepness of the high-definition image produced, channelled by colour information, may be able to successfully reduce texture copying artefacts while keeping edge features[9]. Our enhanced super resolution depth map is demonstrated by quantitative and qualitative experiment results[17]. According to the base paper [2], pattern springiness and satellite severance have become increasingly important as threats to US nation-wide security satellite arrangements have increased. Saleable, technical, and administrative applications in inaccessible sensing, communications, navigation, and research have benefited from CubeSats, which have the potential to make satellite patterns more robust. However, CubeSats' inalienable size, mass, and Better power limit imaging methods; small lenses and Narrow spectral lengths lead to decreased spatial frequency visuals. Due to their low quality, CubeSat images are only relevant for soldierly arrangements and nation-wide intelligence applications. This study proposes CubeSat photography applications and a high-resolution deep learning system. According to this basic paper [3], interpolation is a critical element in picture super-resolution techniques because it aims to generate high-resolution photos without aberrations such as ringing and blurring. As a result, a method for achieving interpolation by injecting high-frequency signal components predicted using "process similarity" has been proposed. The resemblance between a resolution-based image decomposition and a resolution-based image decomposition is referred to as "process similarity". Wavelet transforms are classified into two types: discrete wavelet (DWT) and stationary wavelet (SWT). These two wavelet transform methods are usedto initiate the decomposition process, which provides visual details and approximations[18].The fact that DWT and SWT are compatible aids in determining the formational relationship between the input image and its low-definition estimation[10]. The optimal model values are calculated through particle swarm optimization which represents the structural relationship. Because the processes are similar, these attributes are used to generate the high-resolution output image from the input image[10]. The proposed technique is compared to six existing solutions in terms of PSNR, SSIM, and FSIM measurements, as well as computation time (CPU time).This method uses the least amount of CPU time while delivering comparable results. Super-Resolution (SR) is utilised in a variety of applications, including object detection, therapeutic imaging, outpost secluded sensing, and others[7]. The reconstruction of a high-resolution image from a low-resolution image is defined in the base paper [4]. Deep learning based image
Advances in Science and Technology Vol. 124
127
reconstruction techniques have made remarkable progress, keeping pace with the rapid development of deep learning[7]. To create the model and obtain image super-resolution, R-SRGAN is used in this paper. By putting leftover blocks between neighbouring convolutional layers, more detailed information is saved. GAN generator. As a loss function, the Wasserstein distance is applied to boost the training impact while also obtaining image super-resolution. IV.
System Design
Fig 1: Architecture of SRGAN
Fig 2: Flow of SRGAN We extract low-translucence images from high-translucence sample images because our model has to know what a high-resolution image is. So we use HR images and then down sample them to LR images. Then We send LR images into a generator, which up-samples and produces superresolution images, after which we use a discriminator to extract HR images and back propagate the GAN loss to train the discriminator and generator. Finally, we compare the generated high-
128
IoT, Cloud and Data Science
resolution image with the original high-resolution image to know whether the model has been trained properly or has the ability to produce a super-resolution image. V.
Proposed Work
We want to show how to recover high-resolution photos using a Generative Adversary Network model (GAN). The Super-Resolution Generative Adversary Network, in particular (SRGAN) While other standard approaches result in image information loss and smoothing, SRGAN tries to recover finer textures with greater accuracy and opinion score without compromising image quality. As a result, the model we're working with is a generative adversarial network (GAN). SRGAN, to be precise. Let's take a closer look at SRGAN now. So, what exactly is SRGAN? It is one of the deep learning GAN designs. First and foremost, we must comprehend GAN to grasp the essential notion of SRGAN. The GAN stands for Generative Adversarial Network, and it creates content from the ground up. The generator and discriminator, which both operate as encoders and decoders, are the two major networks in GAN. The generator generates content using the training dataset, while the discriminator determines whether the output is from the training dataset or was generated artificially. The generator is then instructed by the discriminator to produce a more realistic outcome. Our problem is to enhance the quality of a low-definition image, which is why we intend SRGAN. There are many methods for accomplishing image interpolation, but the difficulty is that they all result in content loss, blurring, and noise amplification. It also has a long training period, which causes performance to deteriorate. However, because SRGANs utilize a loss function that is a perceptual loss, we can acquire finer picture texture and realistic images with SRGANs. As previously stated, the SRGAN contains a generator and discriminator, which is primarily made up of Residual blocks. "Residual blocks: they facilitate network training and enable deeper networks, resulting in improved performance." The residual block, consists of convolution layer batch normalization parameterized all, pixel shufflerX2, and loss function. 1. Convolutional layer: this layer contains a set of filters, kernels, parameters, and channels that must be learned for the training. 2. Batch Normalization: this technique is used to help neural networks run faster and more consistently. 3. Use Parameterized Relu(PRelu) instead of relu or leakyRelu. It's a programmable parameter that allows you to quickly discover the negative path co-efficient (X-to-Y-path By maintaining all other variables constant, an increase in Y is expected for a one-unit rise in X). 4. Pixel ShufflerX2: Up scaling or up sampling of feature maps. 5. Loss Function: A persistent loss function is used by the SRGAN, This is the subjective total of two different loss mechanisms: content loss and adversarial loss[8] (1). This loss is required for the generator architecture to be implemented:
Advances in Science and Technology Vol. 124
129
• Adversarial Loss: We can potentially bring our solution closer to the natural image by invoking a discriminator network trained to identify between super-resolved images and original photo-realistic images[13](2).
• Content Loss: perceptual similarity is preserved rather than pixel-wise similarity. We'll be capable to restore photo-realistic textures from highly down-sampled pictures using this method. We utilize a loss function that is closer to perceptual resemblance rather than pixel-wise losses[14](3).
In this paper, we use two types of content loss: The most prevalent loss in image super-resolution is MSE loss. On the other hand, MSE loss is adaptable enough to deal with high-recurrence material in pictures, resulting in images that are too smooth[8]. As an outcome, we've chosen to exploit the depletion of multiple VGG layers. The "ReLU" activation layers of the pre-trained 19-layer "VGG" network are used to estimate this "VGG" loss. “VGG” Content loss: “VGG” stands for visual geometry groups, which is defined as the Euclidean distance between a reconstructed image's feature representation and the reference image(4).
For specific layers inside VGG, we want their features to be matched to produce a better outcome, hence SRGAN employs perceptual loss, which measures the MSE of the features retrieved by a VGG Network. A.
Data collection
The dataset used in this project is "Srgan_dataset" was formed by combining 100 "urban100," 100 "bsds100," 100 "bsds200," 161 "fire CCTV images," 350 "image celeba," and 369 "accident CCTV image footage" datasets for training, as well as 85 "fire CCTV images" and 46 "accident CCTV image footage" datasets for testing from Kaggle. There are 1161 high-resolution photos for training(fig 3), 131 images for testing(fig 4), and two sample images(fig 5) for performance evaluation in the Srgan dataset.
130
IoT, Cloud and Data Science
Fig 3: Sample images of 1161 Training images
Fig 4: sample of 131 Testing images
Fig 5 : 2 Sample images B.
Defining of models
Generator and Discriminator models are the two models that make the SR-GAN architecture[8], which are similar to GAN models in that the "generator" yields data dependent on a training set and the "discriminator" tries to guess if the data comes from the input dataset or the generator. In order to deceive the discriminator, the generator will try to improve the generated data. The generator and discriminator models will be discussed in more detail later.
Advances in Science and Technology Vol. 124
131
Generator model: The generator model uses a residual network rather than a deep convolution network because residual networks are easier to train and produce better results. This is due "pass connections", a residual network type of connection ResNet was used to generate a total of 16 residual blocks (B). In the residual block, two "convolutional layers" with small 3X3 kernels and sixty-four characteristic maps are then followed by "batch-normalization layers" and "ParametricReLU" as an activation function[8]. Two trained sub-pixel convolution layers are used to increase the resolution of the input image. This generator architecture, like "LeakyReLU", employs "parametric ReLU" as an initiation function rather than a fixed value for a rectifier factor (alpha).It absorbs the parameters of the rectifier adaptively and improves the precision at a negligible extra computational cost(fig 6).
Fig 6: Generator Architecture Discriminator model: The discriminator's response is to distinguish between the actual HR image and the generated SR image. The architecture of the discriminator in this paper is much like "DCSRGAN" architecture, with "LeakyReLU" as its trigger. The network is comprised of "eight convolutional layers" each with "3X3 filter kernels," with the amount of kernals raised by a factor of 2 from "64 to 512". When the feature count is doubled, convolutional sequences are used to reduce the resolution that belongs to the image. "512 feature maps" are shadowed by two compacted regions, with a "LeakyReLU" between them and a final sigmoid activation function applied to calibrate sample classification probabilities (fig 7).
Fig 7: Discriminator architecture C.
Training
A high resolution image is converted to Low resolution image. The image is then up sampled by "generator" and converts the low resolution image to super resolution image .Image is then put via "discriminator" which produces an adversial loss by attempting to differentiate the super resolution and high resolution image which is then propagated back into the "generator" .The SRGAN was trained with four epochs , one batch size and 600 steps.
132
IoT, Cloud and Data Science
Fig 8 : Model Training D.
Performance Evaluation
After obtaining a high-definition image using super-definition renovation technology, we evaluated its feature using the “Peak Signal-to-Noise Ratio (PSNR)”, “Mean Squared Error (MSE)”, and “Structural Similarity Index Measure (SSIM)”. PSNR is a common image objective evaluation metric that is typically expressed in decibels(5). It is centred on the inaccuracy difference between analogous pixels and the error in delicate image quality evaluation[7](5).
SSIM is used to measure the similarity index of two imageries also it is a perception-based model that ruminates image deprivation as the perceived change in essential information while also slotting in important perceptual occurrences, including both radiant masking and dissimilarity masking terms[11](6).
The squared error between the original image and the reconstructed image is compared using MSE(7).
These three evaluation indices are broadly used in various fields of image processing due to their meek computation and unambiguous statistical meaning[7]. The greater the PSNR values, the lower the error (better image), and the smaller the MSE values, the lower the error (better image).The SSIM of a reconstructed image to a ground-truth image is always one or a value close to one indicates that the image is of good quality. The better the image quality and super-exceptional resolution's outcomes.
Advances in Science and Technology Vol. 124
1.
133
Experiment analysis Table 1: PSNR, MSE and SSIM Values
PSNR MSE SSIM LR and OHR 68.15 0.0099 0.77 GHR and 72.25 0.0045 0.99 OHR According to the above Table 1, the PSNR value of GHR-OHR is higher, which means it produces less error, and it is also larger than the LR-OHR. Similarly, the MSE value of GHR-OHR is higher than the LR-OHR, but it is lower, which means the error is lower, and finally, the SSIM value of GHR-OHR is higher than the LR-OHR and it is very close to 1. So, based on our observations, we can confidently state that our project restored a higher-quality image without content being lost. *LR – Low Resolution, *OHR – Original high resolution and *GHR – Generated high resolution. 2.
Related work comparison Table 2: Methods result comparison Methods SRGAN Interpolation[4] SRCNN[4]
PSNR 72.25 28.43 30.48
SSIM 0.99 0.82 0.86
From the above Table 2, we have analogized the "PSNR" and "SSIM" values of the interpolation and SRCNN algorithms used in one of our base papers [4] with the SRGAN which we produced. It is proven that SRGAN is better as compared to interpolation and SRCNN because the PSNR values of interpolation and SRCNN are lower than SRGAN, and the SSIM value is also slightly lower than SRGAN. It can better simulate and restore images after understanding the histogram of the original image. VI.
Result
Fig 9: Generated high resolution image
134
IoT, Cloud and Data Science
Fig 10: Epochs vs D loss
Fig 11: Epochs vs D accuracy
Fig 12: Epochs vs G loss The above fig 10 shows that "D LOSS" goes down from the beginning and eventually decreases at the end. Similarly, fig. 11 shows that "D ACCURACY" eventually increases from the start of the epoch and reaches its maximum accuracy at the end of the epoch. Similarly, Figure 12 shows that "G LOSS" is evidently decreasing, and in the middle of the epoch it goes up and down before becoming constant, indicating that the generator is stronger than the discriminator, which means we have produced a quality image based on the losses observation. VII.
Conclusion & Future Scope
Image super-resolution has become the focus of image research in recent years in applications such as image dispersal, pneumatic isolated detecting, and curative imaging[7]. This paper proposes to use SRGAN to solve the problem of converting low-definition surveillance images to highdefinition images. This method outperforms “interpolation”, “SRCNN”, and other methods in terms
Advances in Science and Technology Vol. 124
135
of PSNR, MSE, and SSIM indicators, predominantly with regard to image variance and acuity . Moreover, the image produced conveys a robust intellect of realism. In spite of that, this paper has some limitations, such as the fact that we developed the model only for images, rather than surveillance videos, which we will work on in the future. References [1] Y. Wen, B. Sheng, P. Li, W. Lin and D. D. Feng, "Deep Color Guided Coarse-to-Fine Convolutional Network Cascade for Depth Image Super-Resolution," in IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 994-1006, Feb. 2019. [2] W. Symolon and C. H. Dagli, "Single-Image Super Resolution using Convolutional Neural Network," Procedia Computer Science, vol. 185, pp. 213 - 222, Elsevier B. V., Jun 2021. [3] Sobhan Kanti Dhara, Debashis Sen,”Across-scale process similarity based interpolation for image super-resolution,Applied Soft Computing” ,Volume 81,August 2019,105508. [4] X. Xue, X. Zhang, H. Li and W. Wang, "Research on GAN-based Image Super-Resolution Method," 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), 2020, pp. 602-605. [5] M. Cao, Z. Liu, X. Huang and Z. Shen, "Research for Face Image Super-Resolution Reconstruction Based on Wavelet Transform and SRGAN," 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2021, pp. 448-451 . [6] Purab Alok Jain , Pranali K. Kosamkar,"Analysis and Prediction of COVID-19 with Image Super-Resolution Using CNN and SRCNN-Based Approach", Vol. 248, Jan 2022,pp.33-40. [7] Yitong Yan, Chuangchuang Liu, Changyou Chen, Xianfang Sun, Longcun Jin, Xiang Zhou. Fine-grained Attention and Feature-sharing Generative Adversarial Networks for Single Image Super-Resolution. arXiv preprint arXiv:1911.10773, 2019. [8] S. Liu et al., "Infrared Image Super Resolution Using GAN With Infrared Image Prior," 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), 2019, pp. 1004-1009. [9] A. A. Tandale and N. D. Kulkarni, "Super-Resolution of Color Images Using CNN," 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), 2018, pp. 1-5. [10] N. T. Man and T. Q. Vinh, "Image Super-Resolution Using Image Registration and Neural Network Based Interpolation," 2016 International Conference on Advanced Computing and Applications (ACOMP), 2016, pp. 164-167. [11] A. Horé and D. Ziou, "Image Quality Metrics: PSNR vs. SSIM," 2010 20th International Conference on Pattern Recognition, 2010, pp. 2366-2369. [12] Feng Zhou, Yong Hu, Xukun Shen,"Advances in Multimedia Information Processing – PCM 2018", Springer Science and Business Media LLC, 2018, Vol. 11166. [13] C. Ledig et al., "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 105-114. [14] X. Zhong, Y. Wang, A. Cai, N. Liang, L. Li and B. Yan, "Dual-Energy CT Image Superresolution via Generative Adversarial Network," 2021 International Conference on Artificial Intelligence and Electromechanical Automation (AIEA), 2021, pp. 343-347. [15] Yuxin PengShi-Min HuMoncef GabboujKun ZhouMichael EladKun Xu,"Image and Graphics",2021, Volume 12888.
136
IoT, Cloud and Data Science
[16] Li Chen, Jing Tian,"Depth image enlargement using an evolutionary approach",Signal Processing: Image Communication,Volume 28, Issue 7, August 2013, Pages 745-752. [17] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image super-resolution with sparse prior,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 370–378. [18] W. Dong, L. Zhang, R. Lukac, G. Shi, Sparse representation based image interpolation with nonlocal autoregressive modeling, IEEE Trans. Image Process. 22 (4) (2013) 1382–1394.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 137-146 doi:10.4028/p-qq6o9q © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-06 Accepted: 2022-09-16 Online: 2023-02-27
Classification of Covid-19 X-Ray Images Using Fuzzy Gabor Filter and DCNN Sandhiyaa S.1,a*, Shabana J.2,b, Ravi Shankar K.3,c and Jothikumar C.4,d SRM Institute of Science and Technology, Kattankulathur, Chennai, Tamil Nadu, India
1-4
[email protected], [email protected], [email protected], d [email protected]
a
Keywords: X-ray image, Deep learning techniques, Median filter, Fuzzy Gabor filter, DCNN, Multiclass classification.
Abstract. The rapid growth in Covid-19 cases increases the burden on health care services all over the world. Hence, a quicker and accurate diagnosis of this disease is essential in this situation. To get quick and accurate results, X-ray images are commonly used. Deep Learning (DL) techniques have reached a high position since they provide accurate results for medical imaging applications and regression problems. However the pre-processing methods are not successful in eliminating the impulse noises and the feature extraction technique involving filtering methods did not yield good filter response. In this paper, Covid-19 X-ray images were classified using the Fuzzy Gabor filter and Deep Convolutional Neural Network (DCNN). Initially the Chest X-ray images are preprocessed using Median Filters. After pre-processing, a Fuzzy Gabor filter is applied for feature extraction. Local vector features were first extracted from the given image using the Gabor filter, taking these vectors as observations. The orientation and wavelengths of the Gabor filter were fuzzified to improve the filter response. The extracted features are then trained and classified using the DCNN algorithm. It classifies the chest X-ray images into three categories that includes Covid19, Pneumonia and normal. Experimental results have shown that the proposed Fuzzy Gabor-CNN algorithm attains highest accuracy, Precision, Recall and F1-score when compared to existing feature extraction and classification techniques. Introduction A severe acute respiratory syndrome Coronavirus 2-caused Coronavirus disease (COVID19) has spread around the globe since the first case was discovered in December 2019. The rapid growth in Covid-19 cases increases the burden on health care services all over the world. Hence, a quicker and accurate diagnosis of this disease is essential in this situation. The precise diagnosis of COVID-19 clinical outcome is more challenging, since the disease has various forms with varying structures. Patients with COVID-19 disease can experience a wide range of symptoms and complications, from no symptoms at all to organ failure and death. As accurate COVID-19-specific testing may be costly, not globally accessible, and may take a long time, it is still difficult to provide a differential diagnosis method. Symptoms of severity in COVID-19 individuals are rarely detected and development is usually rapid. The current technologies for Covid-19 diagnosis involve high cost and complex testing processes. Hence healthcare professionals are following additional screening methods like Chest Xrays and Computed Tomography (CT) imaging which are fast and more effective than the normal tests. These screening methods visually guide through the process detecting the Covid-19 infection. But taking CT scans often is more costly and also dangerous for children and pregnant women due to high radiation. To obtain quicker and accurate results, X-ray images are frequently used. Machine learning (ML) and deep learning (DL) based techniques are applied to detect Covid-19 infection at the earlier stages .DL techniques have attained a high position in the field of Artificial Intelligence (AI) since they produce accurate outcomes for medical imaging applications. The pre-processing methods are not successful in eliminating the impulse noises and the feature extraction technique involving filtering methods did not yield good filter response. There are
138
IoT, Cloud and Data Science
not sufficient works done to remove unwanted background regions and to localize the infected region. Literature Survey Rachna Jain et al [1] considered a set of X-ray and CT scan images from both normal and affected patients. Initially, they have applied data cleaning and data augmentation processes on the images. DL based CNN model was then applied for classification. The performance has been compared with Inception V3, Xception, and ResNet models. However this method did not guarantee the accuracy of the predictions. Ji et al [2] have proposed a Covid -19 detection technique using feature fusion. In this technique, pre-processing was done on X-rays after which it uses five standard pre-training models for extracting the specific features. However it classifies the images as normal and Covid affected ignoring the severity levels of the diseases. Ali Narin et al. [3] classified Covid-19 infections from X-ray images using five pre-trained DCNN-based transfer models. They employed three binary classifications with four classes and performed a fivefold cross validation. They analysed x-ray images of 341 Covid patients, 2772 patients with bacterial pneumonia, and 1493 patients with viral pneumonia. This strategy produces more accurate predictions. However, this procedure does not eliminate noise from the images. Tulin Ozturk [4] proposes a new model for automatic COVID-19 detection from chest X-ray pictures. This model incorporates both binary and multi-class classifiers (COVID and Normal) (COVID-19, Normal and Pneumonia). 125 chest X-ray pictures were utilised to train the DarkNet19 model. It utilises Maxpool to create 19 convolutional layers and five pooling layers. However, no feature extraction approach was used in this model. Daniel Arias et al [5] approach uses VGG19 and VGG16 models to process the chest X-ray images and classify them as positive or negative for COVID-19. diaphragm regions are not removed, image noise not reduced. In the pre-processing phase, image resizing and normalization steps are performed. Then in the lung segmentation phase, the surrounding regions which did not provide relevant information for the detection, are removed. In the classification phase, VGG19 and VGG16 models are trained using the transfer learning scheme with pre-trained weights from the Imagenet database. However it classifies the images as normal and Covid affected ignoring the severity levels of the diseases. Two ML models have been proposed by Samarth Bhatia et al [6] to predict the clinical output, severity of the disease and the mortality rate of the patients based on blood test results. Their proposed techniques show 86.3% and 88.06% accuracy over disease severity and mortality prediction, respectively. But the size of their dataset is very small, which is insufficient for training and validation. Proposed Methodology A.
Overview
The Fuzzy Gabor filter and DCNN were used in this paper to classify the Covid-19 X-ray images. To begin with, median filters are used to pre-process the chest X-ray pictures. Following preprocessing, the Fuzzy Gabor filter is used to extract local vector features from the provided image. A new fuzzified Gabor filter is designed as an upgrade to the classical Gabor filter. To increase the Gabor filter's responsiveness, the orientation and wavelengths of the filter were fuzzified. The DCNN method is then used to train and classify the extracted features. It divides chest X-ray images into three categories: COVID-19, pneumonia, and normal.
Advances in Science and Technology Vol. 124
139
Fig. 1 Architecture of Fuzzy Gabor-DCNN model B.
Pre-processing using Median Filters
The most common form of noise is the impulse noise which occurs due to the transmission errors at the communication channel. The range of values for the impulse noise is 0-255. The impulse noise model is represented as follows:
(1) Where xi is the pixel of infected image, i f denote the pixel and Pn and Pp are the probability of infections due to pepper and salt noises respectively. Median Filter (MF) is the frequently used nonlinear filter which applies the median value for replacing the affected pixel. This filter can remove the impulse noise, protecting the edges of the images. Decision Based MF: This filter removes the corrupted pixels by first checking whether the pixel lies in the allowed range inside the processing window. If it lies, then the pixel is considered as normal and unaffected. Otherwise, it is considered as corrupted and is substituted with the median value of the window or neighbour pixel value. The steps involved in the in this filtering process are given below: 1. Let Pij be the pixel value, Pmax, Pmin and Pmed be the maximum, minimum and median pixel values respectively. 2. 2.1 If Pmin < Pij < Pmax, then The pixel is unaffected with noise so it is not altered Else The pixel value is an impulse noise. 2.2 If Imin < Imed < Imax, or 0 < Imed < 255 , The median is affected with impulse noise so the pixel is substituted with the median value of the window. Else The filtered image pixel is substituted by the value of the left neighbourhood pixel. 3. The above steps are repeated until all the pixels are processed. C.
Fuzzy Gabor Filter based feature extraction
a) Gabor Filter: The complex Gabor function in the spatial domain is represented as follows (2) Where the function s(x, y) represents the complex sine wave, and the function wr(x, y) represents the 2D Gaussian.
140
IoT, Cloud and Data Science
The complex sine wave is defined as: (3) Where the (u0, v0) denotes spatial frequencies and the φ is the phase of the filter. Parameters (u0, v0) represent spatial frequencies in Cartesian coordinates. These spatial frequencies can be represented in polar coordinates as follows: (4)
and the spatial coordinates are expressed as:
(5) (6) Using the previous equations, the complex sine wave is represented as: (7) The 2D Gaussian function is defined as: (8) Where A is the amplitude, (x0,y0) represents the center of the function σx and σy denotes the deviation of the Gaussian by each of the spatial coordinates. Finally, after preliminary consideration, a function g(x,y,f,Φ,σ) represents the Gabor filter, where f is the spatial frequency, and Φ denotes the filter orientation:
(9) b) Fuzzy Gabor Filter (i) Fuzzification of the Input Parameters In the first phase, for each numeric input, the corresponding fuzzy value has to be found by mapping the given input data into the interval [0,1] using the corresponding membership function for each rule. The membership functions map the input value to the degree of truth of a statement; In our technique, fuzzification of the Gabor filter parameters orientation and wavelength are done using Bell-type membership functions. (10) Where the interval of the possible orientation is membership function given by
, and S denotes the S-shaped
Advances in Science and Technology Vol. 124
141
(11) The fc is called the crossover point of the S-shaped function. The triangular shape membership function is used as the output membership function which is defined as:
(12) Where the fi and fe are the starting and ending points of the reduced orientation interval defined by the output triangular shape membership function. The fcrisp corresponds to the maximum of the triangular membership function, and it is determined as the arithmetic mean of the fi and fe . For the fuzzification of the wavelengths , the procedure implies some minor changes which arise primarily due to the nature of the variable. Figure 2 and 3 shows the input and output membership functions, respectively, for the orientation and wavelength features.
Fig. 2 Input Membership functions for orientation and wavelength
142
IoT, Cloud and Data Science
Fig. 3 Output membership functions for orientation and wavelength (ii) Fuzzy Logical Operations The second phase involves performing any required fuzzy logical operations. It is necessary to combine the output data after fuzzification from all processed inputs using the corresponding membership functions to get a single value using the operations max or min, depending on whether the parts are connected via logical addition (OR, Union) or multiplication (AND, Intersection), respectively. The max and min operations are defined as follows:
Where U denotes the union, I denotes the intersection, A and B are arbitrary sets. (iii) Fuzzy Rule Evaluation The IF part of the rule is used to provide the output of each rule. Logical multiplication (AND, min) or addition (OR, max) operations are used for rule evaluation. Here, we get the output consequence of each rule which shows the extent of fulfilment of antecedent from the beginning of the rule. The If-THEN rules can be written as follows IF (f, orientationin) THEN (f, orientationout) IF (λ, wavelength1in) THEN (λ, wavelength1out) IF (λ, wavelength2in) THEN (λ, wavelength2out) for the wavelengths, where f and λ represent orientation angles and wavelengths, respectively. The orientation and orientation out represent the corresponding limits of the input and output membership functions in fuzzification of orientations. The wavelength1in and wavelength2in and wavelength1out and wavelength2out represent the two limits in input and output membership functions during the fuzzification process of wavelengths. (i) Fuzzy Rule Aggregation In this phase, the outputs of all fuzzy sets are combined using the appropriate logical operations of (OR, max) or (AND, min). By combining the outputs of all the rules, a compact mathematical
Advances in Science and Technology Vol. 124
143
representation of the entire knowledge base is obtained, which basically represents the cross section of a surface. In our research, the OR operation was used in an aggregation step, and the aggregation was performed similarly for both fuzzy systems orientation and wavelengths. (ii) Defuzzification The last phase is the defuzzification of the final output fuzzy set which is the opposite process of the fuzzification. In this phase, the fuzzy output (signal, function) is converted into a crisp value or scalar. The most often used defuzzification method is the centre of the gravity (COG) which is used for both fuzzy systems of orientation and wavelengths. Once the fuzzification of the Gabor filter was completed, the new fuzzy Gabor filter was developed with fuzzified parameters. After filtering the input images with the fuzzified Gabor filter, the textures were detected and highlighted in the complex images. D.
DCNN based classification of Chest X-ray Images
Architecture of DCNN A DCNN uses filters on the original pixel of an image to gather detail patterns compared to global patterns using a conventional neural net. A DCNN contains the following layers: Convolutional layer (CL): It uses ‘n’ number of filters to the feature map. After the intricacy, a Relu stimulation function is used to include non-linearity to the network. Pooling layer (PL): Max pooling is the traditional method to divide the feature maps into sub regions (with a 2x2 size) and retains only the extreme values. Completely connected layer: The whole neurons from the preceding layers are associated to the subsequent layers. The DCNN will categorize the labels based on the aspects from the CLs and abridged with the PL.
Fig. 4 Architecture of proposed DCNN model Table 1 illustrates the details of the architecture shown in Figure 4. Table 1 Detailed Parameters of Proposed DCNN
144
IoT, Cloud and Data Science
Results and Discussion The proposed Fuzzy Gabor-CNN classification model has been implemented in Python. The COVID-19 X-ray image dataset is obtained from https://github.com/ieee8023/COVID-chestxraydataset. developed by Cohen JP using images from various open access sources. The database (http://openaccess.thecvf.com/content_cvpr_2017/papers/Wang_ChestX-ray provided by Wang et al. was used for normal and pneumonia X-ray images. A.
Results for X-Ray Images
In the X-ray images, 234 normal, 390 Pneumonia and 267 Covid images are taken as input. Figure shows the input images from normal, Covid-19 and pneumonia databases.
Fig. 5 (a) Normal Image
b) Fuzzy Gabor Filtered Image
Fig. 6 (a) Covid affected Image
b) Fuzzy Gabor Filtered Image
Advances in Science and Technology Vol. 124
145
Fig. 7 (c) Pneumonia affected Image
b) Fuzzy Gabor Filtered Image B. Comparison Results In this section, the classification performance of Fuzzy Gabor-DCNN technique is compared with Gabor-DCNN and Anisotropic diffusion-DCNN classifiers. Table 2 shows the comparison results for all the classifiers. Table 2 Comparison of Performance metrics for all the classifiers
Fig. 8 Performance Comparison for all the techniques As seen from Fig. 8, the accuracy of Fuzzy Gabor-DCNN attains the highest of 94.6 followed by Gabor-DCNN (91.3) and AD-DCNN (87.6). Similarly, the proposed Fuzzy Gabor DCNN outperforms the other two algorithms for other metrics also.
146
IoT, Cloud and Data Science
Conclusion The classification of Covid-19 X-ray pictures was performed in this work using Fuzzy-Gabor DCNN. The Chest X-ray pictures are initially pre-processed with Median Filters. Following preprocessing, a Fuzzy Gabor filter is utilised to extract features. The Gabor filter's orientation and wavelengths were fuzzified to increase the filter's response. After extracting the features, the DCNN algorithm is used to train and classify them. It categorises chest X-ray images into three groups: Covid-19, pneumonia, and normal. The performance of the Fuzzy Gabor-DCNN approach is compared to that of the Gabor-DCNN and Anisotropic diffusion-DCNN classifiers. When compared to existing feature extraction and classification algorithms, experimental results indicate that the proposed Fuzzy Gabor-DCNN algorithm achieves the highest accuracy, precision, recall, and F1-score. References [1] Jain R., Gupta M, Taneja S. and Hemanth D.J, "Deep learning based detection and analysis of COVID-19 on chest X-ray images", Applied Intelligence , Elsevier, 2021,51:1690-1700. [2] Ji D, Zhang Z, Zhao Y and Zhao Q, "Research on Classification of COVID-19 Chest X-Ray Image Modal Feature Fusion Based on Deep Learning", Hindawi, Journal of Healthcare Engineering, Volume 2021, Article ID 6799202, 12 pages, 2021. [3] Narin A, Kaya C and Pamuk Z, "Automatic Detection of Coronavirus Disease (COVID-19) Using X-ray Images and Deep Convolutional Neural Networks", DOI:10.1007/s10044-021-00984-y, arXiv, 2020. [4] Tulin Ozturk, Muhammed Talo, Eylul Azra Yildirim , Ulas Baran Baloglu, Ozal Yildirim and U. Rajendra Acharya, "Automated detection of COVID-19 cases using deep neural networks with X-ray images", Elsevier, Computers in Biology and Medicine 121 (2020) 103792,2020. [5] Daniel Arias-Garzón, Jesús Alejandro Alzate-Grisales, Simon Orozco-Arias,Harold Brayan Arteaga-Arteaga, Mario Alejandro Bravo-Ortiz, Alejandro Mora-Rubio, Jose Manuel SaboritTorres, Joaquim Ángel Montell Serrano, Maria de la Iglesia Vayá,Oscar Cardona-Morales and Reinel Tabares-Soto, "COVID-19 detection in X-ray images using convolutional neural networks", Elsevier, Machine Learning with Applications 6 (2021) 100138, 2021. [6] Samarth Bhatia, Yukti Makhija, Sneha Jayaswal, Shalendra Singh and Ishaan Gupta, "Severity and Mortality prediction models to triage Indian COVID-19 patients", arxiv, 2021. [7] Suman Shrestha,” Image Denoising Using New Adaptive Based Median Filter”, Signal & Image Processing: An International Journal (SIPIJ) Vol.5, No.4, August 2014 [8] Hafiz Suliman Munawar, Riya Aggarwal, Zakria Qadir, Sara Imran Khan, Abbas Z. Kouzani and M. A. Parvez Mahmud,” A Gabor Filter-Based Protocol for Automated Image-Based Building Detection”, Buildings 2021, 11, 302. https:// doi.org/10.3390/buildings11070302 [9] Vladimir Tadic, Tatjana Loncar-Turukalo , Akos Odry , Zeljen Trpovski, Attila Toth, Zoltan Vizvari and Peter Odry,” A Note on Advantages of the Fuzzy Gabor Filter in Object and Text Detection”, Symmetry 2021, 13, 678. https:// doi.org/10.3390/sym13040678
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 147-155 doi:10.4028/p-4zt8lr © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-08 Accepted: 2022-09-16 Online: 2023-02-27
Using Transfer Learning for Automatic Detection of Covid-19 from Chest X-Ray Images H. Mary Shyni1,a and Chitra E.2,b* Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Kattankulathur, Chennai, India
1
Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Kattankulathur, Chennai, India
2
[email protected], [email protected]
a
Keywords: COVID-19 Detection, X-ray images, Transfer Learning, Pre-trained Models, Data Augmentation
Abstract. The hasty spread of the perilous coronavirus has resulted in a significant loss of human life and unprecedented public health challenges around the world. Early screening of COVID-19 followed by an immediate medical response can halt the spread of the infection. The deep learning algorithms coupled with chest X-ray images provide fast and accurate results. This study aims to fine-tune seven pre-trained models for COVID-19 detection using chest X-ray radiographs. A sample of 3428 chest X-ray images comprising 1626 COVID-19 images was used to train and validate the models. The Inception V3 model outperformed the other models with an accuracy of 99.42%. Introduction The initial outbreak of COVID-19 was reported in December 2019 in Wuhan, China and within a few months, the virus has spread hastily across the globe [1]. Due to its health, psychological and economic effects, COVID-19 was asserted as a serious global public health threat. More common symptoms include fever, irritation in the airways, hyposmia, hypogeusia and panting and the incubation time is 2 to 14 days [2]. RT-PCR is the primary laboratory testing used for confirming COVID-19 infection. But it is a tedious and troublesome process that may lead to further transmission of the disease. X-ray and Computed Tomography (CT) are the commonly used medical imaging techniques for detecting chest abnormalities. Once the COVID-19 virus reaches the host’s lung, the virus starts multiplying and produces new viruses. The infected cells produce lesions in the lungs. Because the lungs are most seriously contaminated by the infection, X-ray and Computed Tomography are used where the lesions are observable in the form of ground-glass opacity [3]. Because they emit less radiation, have a fast execution time, which reduces the risk of virus replication, and are inexpensive, X-rays are often employed in the detection of COVID-19 [4]. Though a CT scan provides detailed information about the affected area, it is more expensive, produces a very high radiation dose, and is time-consuming [5]. Considering the safety of children and pregnant women [6], we have chosen Xray as the imaging modality in this study. Because of the lack of advanced understanding about the infection, even experienced radiologists are finding it difficult to anticipate infection from medical images. Deep learning algorithms trained on medical images have proven to be a beneficial approach for preliminary detection of COVID-19 which yielded more accurate and faster results [7]. This research aims to fine-tune seven pre-trained models namely VGG 16, VGG 19, ResNet 50, Inception V3, Xception, MobileNet V2 and DenseNet 201 for COVID-19 detection. All these models have been trained on publicly available COVID-19 and normal chest X-ray images. The rest of this paper is structured as: Related Works section describes a review of recent works related to this study. The next section presents the proposed methodology and the methods for resolving the problem of data shortage. The Experiments and Results section contains the experimental results and the comparison evaluation metrics of the seven pre-trained models. Finally, the paper gets concluded in the last section.
148
IoT, Cloud and Data Science
Related Works Ali Narin et al [8] has trained five pre-trained models on three different datasets of chest X-ray images for detecting COVID-19. ResNet 50 outperformed the other four models with an average accuracy of 98.43%. For detecting Covid-19 at an early stage, Imran Ahmed et al [9] proposed an IoT-based framework that used faster regions with Convolutional Neural Network (Faster-RCNN) in combination with ResNet-101 to perform the classification. Using Region Proposal Network (RPN) and bounding box classification the model obtained an accuracy of 98%. Mundher Mohammed Taresh et al [10] evaluated the performance of eight pre-trained models with a dataset containing 1341 normal CXR images, 1200 COVID-19 CXR images and 1345 viral pneumonia CXR images for 3-class classification. VGG 16 yielded better accuracy and precision when compared with other models. In [11] a transfer learning model based on DenseNet 201 was introduced to detect covid-infected patients using CT scans. In this method, all the preceding layers have direct connectivity to all subsequent layers which provided a 1% increase in accuracy compared withVGG16 and ResNet 152 V2. Mohammad Shorfuzzaman and Mehedi Mazud replaced the classifiers of five pre-trained models with a new classifier which is a combination of an average pooling layer and two dense layers. ResNet 50 V2 performed better with an accuracy of 98.15%. Also, ensembling is performed and the prediction is done using the majority vote basis [12]. COVID-CXNet was introduced by Arman Haghanifar et al [13], which employed CheXNet as the basic model for COVID-19 identification from X-ray radiographs. CheXNet has used the transfer learning approach which has reduced its training time and improved the accuracy in comparison with the base model. A deep learning assisted model was proposed in [14] which evaluated the performance of a few pre-trained models using X-ray images. The last layer in the pre-trained models is supplanted by a fully connected layer to detect COVID-19 and normal cases. ResNet-34 excelled with an accuracy score of 98.33%. In [15] Shanjiang Tang et al introduced EDL- COVID, which is an ensemble method based on COVID-Net. Instead of using multiple pre-trained models for prediction, multiple snapshots of the model are produced and they are ensembled using the Weighted Average Ensembling (WAE) approach which yielded an accuracy of 95%. Ezz El-Din Hemdan et al [16] proposed a framework of seven pre-trained classifiers called COVIDX-Net and trained it with X-ray images for the automatic detection of COVID-19. VGG 19 and DenseNet 201 performed well with an accuracy score of 90%. A comparative analysis has been performed among seven pre-trained models using CT and X-ray images for the classification of coronavirus. When compared to other models, VGG 19 scored better with an accuracy of 98.75% [17]. Proposed Methodology The proposed model aims to detect whether the given chest X-ray images are COVID-19 infected or normal. Fig. 1 presents the pictorial representation of the proposed methodology. The contribution of the work consists of: Dataset collection and splitting, Pre-processing of data, Transfer learning and model training and prediction. Dataset. The pre-trained models that were trained on the ImageNet dataset were further trained and evaluated by the chest X-ray images for detecting COVID-19 cases. The X-ray images used for this comparative analysis are from a medical image directory created by Sachin Kumar which is publicly available in the Kaggle database [18]. The directory consists of 1626 COVID positive chest
Advances in Science and Technology Vol. 124
149
Fig. 1. Schematic representation of the proposed methodology
X-ray images and 1802 COVID negative (Normal) chest X-ray images. All the images were resized to 256 x 256 and available in PNG format. Fig. 2 shows the representation of X-ray images of a) Normal cases and b) COVID-19 positive cases from the dataset.
Fig. 2. Samples: a) Normal cases b) COVID-19 positive cases
The models were evaluated with a dataset split ratio of 90% for training and 10% for testing. To avoid overfitting and for hyperparameter tuning, 20% of training data is used as validation data. After the data split, the training dataset comprises 1138 COVID X-ray images and 1262 normal X-ray images, which is 70% of the available data. 20% of the available samples (i.e) 325 COVID X-ray images and 360 normal X-ray images are in the validation dataset. The test dataset comprises 163 COVID X-ray images and 180 normal X-ray images, which is 10% of the total samples. Fig. 3 represents the split of our dataset.
Fig. 3. Dataset Split
150
IoT, Cloud and Data Science
Data Pre-processing. Normalization and Data Augmentation are done at the pre-processing stage. Normalization is the process of transforming the numeric data to a common value when there is a variation in the input data range. With normalization, the learning process is stabilized and can converge faster. The grayscale images used in this study are rescaled by dividing the pixel by 255 and thus they have been normalized in the range of 0 to 1 [19]. A large amount of data is required to train the CNN models effectively and accurately. However, the dataset considered for this study contains only 2400 X-ray images for training. To maximize the samples in the training dataset, the Data Augmentation technique is used which creates a copy of the original image with slight modifications in it. Augmentation increases the variability and also improves the classification accuracy of the model. Three types of data augmentation techniques were applied to all the images in the training dataset. (i) images were sheared by 20o (ii) images were scaled to 20% (iii) images were flipped horizontally. After performing augmentation, 4552 COVID-19 positive X-ray images and 5048 normal X-ray images were available in the training dataset. Data augmentation should not be applied for validation and test datasets [20]. Transfer Learning. Transfer Learning refers to the reuse of knowledge gained by a neural network model that has been trained for one task in another task. The initial layers of the model are responsible for learning the basic features and the final layers learn the more complex features. So only the top layers of the pre-trained models are retrained when it is used for a different task [21]. The models are trained with an enormous amount of data and resources. Pre-trained models are utilized in many COVID detection frameworks because of the severe shortages of COVID-19 samples and the rapid spread of the disease. This decreases training time and enhances model performance [22]. The concept of transfer learning is illustrated in Fig. 4.
Fig. 4. Transfer Learning Concept
In this study, we evaluated the performance of seven pre-trained models namely VGG 16, VGG 19, ResNet 50, Inception V3, Xception, MobileNet V2 and DenseNet 201 that are often employed in the COVID-19 diagnosis architecture. The ImageNet dataset, which comprises over 14 million images, was used to train these models. The Fully Connected (FC) layer that replaced the last layer of these models alone is retrained whereas the weights of the initial layers remain unchanged. Model Training and Classification. Finally, the model has been trained for the number of iterations known as epochs using the training data. Our models were trained for 15 epochs. For the first epoch, the weight and bias are randomly initialized. The predicted output is compared with the actual output. The weights and biases are revised based on the loss function and backpropagated to the neurons in the initial layers. The process continues for every epoch which tries to reduce the loss function thereby improving the accuracy.
Advances in Science and Technology Vol. 124
151
Experiments and Results An experimental analysis was performed using seven pre-trained models namely: VGG 16, VGG 19, ResNet 50, Inception V3, Xception, MobileNet V2 and DenseNet 201for detecting COVID-19 from chest X-ray images. Experimental Parameters. Initially, the chest X-ray images available in the dataset were 256 x 256 pixels. Before training, for the VGG 16, VGG 19, ResNet 50, MobileNet V2 and DenseNet 201 networks the images were resized to 224 x 224 and for the Inception V3 and Xception networks the images were resized to 299 x 299. For simulation, Python is used as the programming language and Keras / Tensorflow is the deep learning backend. The models were trained for 15 epochs with a batch size of 32. Adam optimization was used with a learning rate of 0.0001. The last dense layer in all the pre-trained models was fined tuned in such a way that it outputs binary class corresponding to COVID-19 or normal class instead of the actual 1000 classes trained with the ImageNet dataset. As it is a binary classification problem, binary_crossentropy was used as the loss function. Performance Metrics. Model performance was evaluated based on metrics like Accuracy, Sensitivity or Recall, Specificity, Precision, F1 Score, Area Under the ROC Curve (AUC) [23]. Using the True Positives (TP), False Positives (FP), False Negatives (FN) and True Negatives (TN) parameters from the confusion matrix, the metrics are defined as: 𝑇𝑇𝑇𝑇+𝑇𝑇𝑇𝑇
Accuracy = 𝑇𝑇𝑇𝑇+𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹+𝐹𝐹𝐹𝐹.
(1)
𝑇𝑇𝑇𝑇
Sensitivity / Recall = 𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹.
(2)
𝑇𝑇𝑇𝑇
Specificity =𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹.
(3)
𝑇𝑇𝑇𝑇
Precision = 𝑇𝑇𝑇𝑇+𝐹𝐹𝐹𝐹.
(4)
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ×𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
F1 Score = 2 × 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 +𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅.
(5)
TP and TN refer to correctly classified COVID-19 and normal cases respectively whereas FP and FN refer to the misclassified COVID-19 and normal cases respectively. Results. The metrics obtained by training all the seven models are compared and tabulated in Table 1. Overall performance of all the models is good. Among all the models, Inception V3 achieved the highest performance with an accuracy of 99.42%, Sensitivity of 100%, Specificity of 98.89%, Precision of 98.79%, F1 Score of 99.39% and AUC of 0.9953. MobileNet V2 was the second-best performer for predicting COVID-19 which obtained an accuracy of 97.08%, Sensitivity of 95.09%, Table 1. Classification Results of all the Seven CNN Models Model
Accuracy (%)
Sensitivity (%)
Specificity (%)
Precision (%)
F1 Score (%)
AUC
VGG 16
95.34
92.02
98.33
98.04
94.93
0.9953
VGG 19
94.75
90.8
98.33
98.01
94.27
0.9966
ResNet 50
90.38
87.12
93.33
92.21
89.59
0.9744
Inception V3
99.42
100
98.89
98.79
99.39
0.9953
Xception
95.36
90.8
99.44
99.33
94.87
98.22
MobileNet V2
97.08
95.09
98.89
98.73
96.88
0.9871
DenseNet 201
96.79
93.87
99.44
99.35
96.53
0.9898
152
IoT, Cloud and Data Science
Fig. 5. Confusion matrix obtained for different CNN models
Specificity of 98.89%, Precision of 98.73%, F1 score of 96.88% and AUC of 0.9871. ResNet 50 achieved the least accuracy of 90.38% with a Sensitivity of 87.12%, Specificity of 93.33%, Precision of 92.21%, F1 score of 89.59% and AUC of 0.9744.
Advances in Science and Technology Vol. 124
Fig. 6. ROC curves obtained for different CNN models
153
154
IoT, Cloud and Data Science
Fig. 5 depicts the confusion matrices for the seven CNN models on the test data. For the best performing model Inception V3, the True Positive value is 163, which indicates that all the COVID19 cases are correctly classified and therefore the False Negative value is 0. Out of the 180 normal cases, 177 are correctly classified as normal cases and 3 are misclassified as COVID-19 positive cases. The Receiver Operator Characteristics (ROC) curve for all the seven models is depicted in Fig. 6. ROC is a probability curve that describes the trade-off between True Positive Rate (Sensitivity) and False Positive Rate (1 – Specificity). It describes the ability of the model to differentiate between COVID-19 and normal cases [24]. Conclusion Adding more layers to the convolutional neural network increases the complexity of the model and requires immense resources to train the model. So instead of training a model from scratch, pretrained models are used that potentially saves time and resources. Pre-trained models namely VGG 16, VGG 19, ResNet 50, Inception V3, Xception, MobileNet V2 and DenseNet 201 are often employed in health care applications. These models are fine-tuned using the transfer learning concept and different metrics such as accuracy, sensitivity, specificity, precision, F1 score and AUC were used to assess their performance. The results suggest that Inception V3 is the best-performed model with an accuracy of 99.42% and a sensitivity of 100%. We intend to test the performance of the suggested model for multi-class classification with a larger number of data samples in the future. References [1] Uddin, Azher, et al. "Study on convolutional neural network to detect COVID-19 from chest Xrays." Mathematical Problems in Engineering 2021 (2021). [2] Keni, Raghuvir, et al. "COVID-19: emergence, spread, possible treatments, and global burden." Frontiers in public health (2020): 216. [3] Arellano, Matías Cam, and Oscar E. Ramos. "Deep Learning Model to Identify COVID-19 Cases from Chest Radiographs." 2020 IEEE XXVII International Conference on Electronics, Electrical Engineering and Computing (INTERCON). IEEE, 2020. [4] Zhang, Jianpeng, et al. "Viral pneumonia screening on chest X-rays using confidence-aware anomaly detection." IEEE transactions on medical imaging 40.3 (2020): 879-890. [5] Arias-Londoño, Julián D., et al. "Artificial Intelligence applied to chest X-Ray images for the automatic detection of COVID-19. A thoughtful evaluation approach." Ieee Access 8 (2020): 226811226827. [6] Fred, Herbert L. "Drawbacks and limitations of computed tomography: views from a medical educator." Texas Heart Institute Journal 31.4 (2004): 345. [7] Silva, Pedro, et al. "COVID-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis." Informatics in medicine unlocked 20 (2020): 100427. [8] Narin, Ali, Ceren Kaya, and Ziynet Pamuk. "Automatic detection of coronavirus disease (covid19) using x-ray images and deep convolutional neural networks." Pattern Analysis and Applications 24.3 (2021): 1207-1220. [9] Ahmed, Imran, Awais Ahmad, and Gwanggil Jeon. "An IoT-based deep learning framework for early assessment of COVID-19." IEEE Internet of Things Journal 8.21 (2020): 15855-15862. [10] Taresh, Mundher Mohammed, et al. "Transfer learning to detect covid-19 automatically from xray images using convolutional neural networks." International Journal of Biomedical Imaging 2021 (2021).
Advances in Science and Technology Vol. 124
155
[11] Jaiswal, Aayush, et al. "Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning." Journal of Biomolecular Structure and Dynamics 39.15 (2021): 56825689. [12] Shorfuzzaman, Mohammad, and Mehedi Masud. "On the detection of covid-19 from chest x-ray images using cnn-based transfer learning." CMC-Computers Materials & Continua (2020): 13591381. [13] Haghanifar, Arman, et al. "Covid-cxnet: Detecting covid-19 in frontal chest x-ray images using deep learning." arXiv preprint arXiv:2006.13807 (2020). [14] Nayak, Soumya Ranjan, et al. "Application of deep learning techniques for detection of COVID19 cases using chest X-ray images: A comprehensive study." Biomedical Signal Processing and Control 64 (2021): 102365. [15] Tang, Shanjiang, et al. "EDL-COVID: ensemble deep learning for COVID-19 case detection from chest x-ray images." Ieee Transactions On Industrial Informatics 17.9 (2021): 6539-6549. [16] Hemdan, Ezz El-Din, Marwa A. Shouman, and Mohamed Esmail Karar. "Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images." arXiv preprint arXiv:2003.11055 (2020). [17] Apostolopoulos, Ioannis D., and Tzani A. Mpesiana. "Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks." Physical and engineering sciences in medicine 43.2 (2020): 635-640. [18] Shastri, Sourabh, et al. "CheXImageNet: a novel architecture for accurate classification of Covid19 with chest x-ray digital images using deep convolutional neural networks." Health and Technology (2022): 1-12. [19] Ikechukwu, A. Victor, et al. "ResNet-50 vs VGG-19 vs training from scratch: A comparative analysis of the segmentation and classification of Pneumonia from chest X-ray images." Global Transitions Proceedings 2.2 (2021): 375-381. [20] Farooq, Muhammad, and Abdul Hafeez. "Covid-resnet: A deep learning framework for screening of covid19 from radiographs." arXiv preprint arXiv:2003.14395 (2020). [21] Ahuja, Sakshi, et al. "Deep transfer learning-based automated detection of COVID-19 from lung CT scan slices." Applied Intelligence 51.1 (2021): 571-585. [22] El Asnaoui, Khalid, and Youness Chawki. "Using X-ray images and deep learning for automated detection of coronavirus disease." Journal of Biomolecular Structure and Dynamics 39.10 (2021): 3615-3626. [23] Orozco-Arias, Simon, et al. "Measuring performance metrics of machine learning algorithms for detecting and classifying transposable elements." Processes 8.6 (2020): 638. [24] Hajian-Tilaki, Karimollah. "Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation." Caspian journal of internal medicine 4.2 (2013): 627.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 156-161 doi:10.4028/p-ozpegj © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-09 Accepted: 2022-09-16 Online: 2023-02-27
Detect Anomalies on Metal Surface Kumar Arnava, Harshul Ravindranb Computer Science Engineering, SRM Institute of Science and Technology, Chennai, India [email protected], [email protected]
a
Keywords - Computer Vision, Anonymity Detection, CNN, Artificial Intelligence
Abstract - Detection of surface imperfections in planar surfaces in the modern assembling process is a continuous area of examination. It has generally been difficult to ensure the outright faultlessness of surfaces of carbon and iron combinations, or any surface so far as that is concerned. Nature of appearance, or rather, the outer layer of modern items like metal sheets is being held at an especially elevated expectation particularly of late, to satisfy these guidelines and to guarantee that client necessities are met, we use PC vision based arrangements, these arrangements basically involve utilizing a 2D or 3D deformity recognition procedure like for example edge identification, these calculations joined with an AI model will recognize and distinguish absconds on the different planar surfaces. The model will be prepared with a dataset of pictures that will contain harmed as well as unharmed surfaces, this is to ensure that the model can distinguish a deformity and without any imperfection the model ought to perceive the surface as homogeneous. However long the model is prepared appropriately, these techniques have been demonstrated to be particularly vigorous, and as an answer they have been common of late and throughout the course of recent years. CNN and brain networks in view of CNN designs were utilized to prepare the model to group the imperfections. I. Introduction In production lines a few surfaces are prone to a variety of defects, normally scratches, crazing and so forth. Quick and exact identification of such defects can guarantee flawless quality affirmation. We mainly focused around detecting metal surface level anomalies. We implemented a machine learning model-based approach to classify and detect the anomalies. The system overview is shown in figure 1. The CNN algorithm provides the right balance between speed and accuracy which is why we used CNN for our project.
Fig1.System Overview
Advances in Science and Technology Vol. 124
157
II. Implementation A. Resources or Components used for implementation a. OpenCV 4.5.5- Open source computer vision library version 4.5.5 b. Python 3.10.2- High level programming language used for various image-processing applications c. Metal surface defect dataset d. Intel I3 5th gen processor e. Intel Integrated Graphics f. 8gb DDR4 RAM B. Database Specifications NEU Dataset The primary dataset being utilized is the NEU-CLS dataset, the pictures in this dataset are of steel surfaces, it is an open-source dataset that is utilized fundamentally for preparing profound learning models and so forth. The dataset comprises 1800 pictures that are isolated into 6 classes, these classes address the different imperfections that can be available on the planar surface. The classes are Pitted surface (PS) , Patches (PA), Inclusion (IN), Rolled-scale (RS) Scratches (SC) and Crazing (CR), . C.
TensorFlow Library
Fig 2. Architecture of TensorFlow TensorFlow is a software library for deep vision and machine learning which is freely available and open source. It can be utilized for a variety of tasks, although it is most commonly employed for deep neural network training and inference.TensorFlow was first developed for internal usage only by the Google Brain team. In 2015 it was released under Apache 2.0 license. It's some kind of machine learning library.TensorFlow is a symbolic mathematical library based on dataflow and discrete programming.Google uses this for both r&d. TensorFlow works at all. You also need to install ‘CUDA’ or rather CUDA toolkit version 9, now, NVIDIA created a parallel computing platform called compute unified device architecture (CUDA) to allow users with CUDA enabled GPUs to use their processing power for general purpose tasks. This is what allows users to use our GPUs for image processing and deep learning applications
Fig 3. How TensorFlow works . Apart from this you need other supporting software like NCCL, CUDNN SDK and Tensor RT. The latest Nvidia device drivers will also be required.
158
IoT, Cloud and Data Science
Installation of TensorFlow is almost always done using pip. The command goes as follows: ‘pip3 install --upgrade TensorFlow’. Installing with GPU support would be ‘pip3 install tensorflow-gpu’. The installation can now be verified by using the import statement, if the libraries have not been properly installed a ‘module not found’ error will be returned. D. Dataset Creating and Training The data from NEU.zip file contains two files: ● Train ● Validation.
Fig 4. Files under NEU.zip After unzipping the train file is divided into two directories ● Train ● Test Here 80% of the test data goes to training the module and 20% of the data goes for testing the module.
Fig 5. Splitting of dataset The data is further categorized into 6 categories ● Crazing ● Inclusion ● Patches ● Pitted surfaces ● Rolled in scale ● Scratches
Fig 6. Categorized into 6 categories
Advances in Science and Technology Vol. 124
159
The images are then adjusted to fit the camera view perfectly. These were done using following functions: ● Rescale : Tilting the photo horizontally ● Shear Range: Angle by which to rotate the picture ● Zoom : Percentage of zoom After the images are set according to the required configuration. We train the data using CNN model with following configuration: steps_per_epoch = 20 batch_size = 32 epochs=100,
Fig 7. Training The model accuracy graph is generated using Matplotlib to determine the accuracy of the CNN model.
Fig 8. Accuracy graph Fig 8 shows that every time accuracy reaches 80% the training stops this was done using callback functions to overcome the Overfitting. Overfitting addresses itself as an issue on the grounds that your model will presently just perceive the highlights, for this situation absconds, on the pictures in the preparation dataset. Introducing images that don’t belong to the training dataset will result in abysmal accuracy and loss figures. Running a deep learning model without a callback resembles driving a vehicle without working brakes, one has zero influence over the result of the model. There are various ways this can be carried out. One of the clearer strategies will be utilized here, Early-Stopping. What this does is it basically restricts the model to a pre-set accuracy figure, or rather it stops the model when it reaches a specific accuracy figure. This will ensure that the model does not overfit itself onto the training dataset. Now on the other end of the spectrum, underfitting presents itself. Most of the time this is a result of the model not being deep enough, in this case the accuracy figures for the training dataset and test dataset will both be below par. There are a couple of ways of settling this, one could add more layers and increment the intricacy of the model, an individual could likewise expand the preparation time by expanding the quantity of ages, if the model actually presents issues with underfitting the main thing left is to change the model through and through. Our model approves of underfitting, truth be told it's the polar opposite, to battle overfitting, restricting the precision of the training phase with the assistance of an Early-Stopping function.
160
IoT, Cloud and Data Science
The optimized CNN saw the expansion of the previously mentioned dropout layer and a call back function. The dropout function changes random input values to 0 during training, this probably won't seem like a lot however it helps prevent overfitting. To ensure that the sum of all the inputs remains unchanged 0 input values are changed to 1. The purpose of the callback function is to limit the training accuracy so as to prevent overfitting, the situation where a model will train itself too well on the training dataset, and as a result, will only recognize images from the said training dataset.
Fig 9. Model summary of optimized CNN III. Results and Analysis The model was validated using two different datasets to find the actual real time accuracy of the model. 1. The validation file from figure 4. 2. GC10-DET Case 1. the validation file from figure 4. When tested with the image from the validation file. We got six out of six correct detection as can be seen in Fig 10. Which means that our model detected all the anomalies perfectly without any fault. Here it can be seen that our model had 100% accuracy with this dataset.
Fig 10 Result with Validation file Case 2. GC10-DET GC10-DET is the dataset consisting of anomalies in metal surfaces collected in the manufacturing industry. It contains ten sorts of surface imperfections, i.e., punching (Pu), weld line (Wl), bow hole (Cg), water spot (Ws), oil spot (Os), silk spot (Ss), consideration (In), moved pit (Rp), wrinkle (Cr), midriff collapsing (Wf). The gathered imperfections are on the outer layer of the steel sheet. The dataset incorporates 3570 gray pictures.When tested with this dataset we got 5 out of 6 correct detections as shown in Fig 11. This gives us an accuracy of 83%. One thing to note here is that this dataset has 10 classes while our output only has 6 classes shown. That is because we removed the water spot, oil spot and silk spot. We also merged rolled pit and waist folding into rolled in scale. This was done by replacing the validation file with the GC10-DET.
Advances in Science and Technology Vol. 124
161
Fig 11. Result with GC10-DET IV. Conclusion Convolutional Neural Network architecture-based models managed to perform well in classifying and detecting the anomalies of steel surfaces. The accuracies varied on the quantity of layers, the utilization of call-back functions, use of weights and the utilization of skip connections used in the neural network. The common issues faced by conventional CNN based models, for example, overfitting, irregularity and training errors were tended to by the proposed Optimized neural network. References [1] Tao, X., Zhang, D., Ma, W., Liu, X., & Xu, D. (2018). Automatic Metallic Surface Defect Detection and Recognition with Convolutional Neural Networks. Applied Sciences, 8(9), 1575. doi:10.3390/app8091575 [2] Tsai, D.-M., & Luo, J.-Y. (2011). Mean Shift-Based Defect Detection in Multicrystalline Solar Wafer Surfaces. IEEE Transactions on Industrial Informatics, 7(1), 125–135. doi:10.1109/tii.2010.2092783 [3] Ghorai, S., Mukherjee, A., Gangadaran, M., & Dutta, P. K. (2013). Automatic Defect Detection on Hot-Rolled Flat Steel Products. IEEE Transactions on Instrumentation and Measurement, 62(3), 612–621. doi:10.1109/tim.2012.2218677 [4] Goel T., Murugan, R., Mirjalili, S. et al. OptCoNet: an optimized convolutional neural network for an automatic diagnosis of COVID-19. Appl Intell 51,1351–1366(2021). doi.org/10.1007/s10489-020-01904-z [5] Chen, F.-C., & Jahanshahi, M. R. (2018). NB-CNN: Deep Learning-Based Crack Detection Using Convolutional Neural Network and NaïveBayes Data Fusion. IEEE Transactions on Industrial Electronics, 65(5),4392–4400.doi:10.1109/tie.2017.2764844 [6] Gino Sophia, Ceronmani Sharmila, "Recognition, Classification for Normal, Contact and Cosmetic Iris Images using Deep Learning", International Journal of Engineering and Advanced Technology, Vol. No. 8, Issue No. 3, ISSN: 2277-3878, pp. 4334-4340, September 2019.doi:10.35940/ijrte.c5185.098319 [7] He, D., Xu, K., & Zhou, P. (2019). Defect detection of hot rolled steels with a new object detection framework called classification priority network. Computers & Industrial Engineering, 128, 290–297. doi: 10.1016/j.cie.2018.12.043 [8] S.G.Gino Sophia, V. Ceronmani Sharmila, "Zadeh max–min composition fuzzy rule for dominated pixel values in iris localization", Soft Computing, Springer Journal, November 2018, Vol. 23, Issue-6, ISSN:1432-7643, 1873-1889. [9] Wu, M., Phoha, V. V., Moon, Y. B., & Belman, A. K. (2016). Detecting Malicious Defects in 3D Printing Process Using Machine Learning and Image Classification. Volume 14: Emerging Technologies; Materials: Genetics to Structures; Safety Engineering and Risk Analysis. doi:10.1115/imece2016-67641 [10] Zhou, Z., Lu, Q., Wang, Z., & Huang, H. (2019). Detection of Micro-Defects on Irregular Reflective Surfaces Based on Improved Faster R-CNN.Sensors, 19(22), 5000. doi:10.3390/s19225000 APPENDIX A.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 162-169 doi:10.4028/p-24rciz © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-09 Accepted: 2022-09-16 Online: 2023-02-27
Indian License Plate and Vehicle Type Recognition D.V. Shrija Sambhavi1,a, Shruthi Koushik2,b and Rameeza Fathima3,c Sriman Narayaniyam, 1st Cross Street, Macro Marvel River View County, Manapakkam, Chennai – 600125, Tamil Nadu, India
1
G1/204, Swiss county, Thergoan, Pune – 411033, Maharashtra, India
2
Matirs 302, PBel City, Kelambakkam – 603103, Tamil Nadu, India
3
[email protected], [email protected], c [email protected]
a
Keywords: Indian License Plate Recognition, License Plate recognition, Coloured License Plate Recognition, Colour Recognition.
Abstract. In light of the growing number of vehicles, automated license plate recognition (ALPR) systems are much needed. The ALPR system is a widely used technology for various vehicle management processes such as law enforcement, surveillance, toll booth operations, parking lots, etc. We propose a license plate recognition system, where a neural network concept is applied. This system includes image pre-processing which helps to quickly and easily locate, segment and recognize the license plate characters, so image pre-processing is one of the important factors that affect total system performance. As we are performing the character segmentation of the license plate, the accuracy of the character recognition increases. In India, license plates are not only different in shape and size but also have different colours according to the registration or license number in India. There are 8 types of license plates in total issued by the RTO. In this effort, we identify the type of license plate by detecting the colour of the license plate. Thus vehicle registration types are recognized from the colour of the license plate detected. Introduction Automatic License Plate Recognition has been around since the early 1970s but was not commercialized until the early 2000s. It is widely used by government officials for vehicle identification and other purposes like toll tax collection. It is also used commercially in places like private parking management systems, etc. In addition, it is a key component of various Smart Transportation System applications, including access control and vehicle traffic surveillance. The Automatic License Plate Recognition system is a technology used to identify vehicles as each license plate has a unique number associated with it. In India, vehicles are registered through the RTO with different coloured license plates which is essential for vehicle type classification such as privately owned or rented. Hence it is crucial to recognize the colour of license plate. A common difficulty faced in vehicle identification in India is that vehicle license plates are not standardized to follow a certain language, font, or size. Additionally, poor quality of image caused by poor lighting, weather conditions, and hindrances make license plate recognition extremely difficult. Certain number plates are also written manually which makes the recognition process particularly challenging in India. This makes the dataset for character segmentation diverse and large. At present, many ALPR system applications are far from perfect. We aim to produce a lightweight system that would not require a powerful GPU to run. This model considers different colour vehicle number plates which contain English characters and numbers only. The first step in this process is to detect the license plate. WPOD-Net proposed by Silva and Jung gives an impressive solution for license plate detection. The detected license plate is then used to identify the colour of the plate by identifying the dominant colour. To increase the accuracy of character recognition the detected license plate is first segmented and then each character is sent for recognition. The main objective of this ALPR system is to detect license plates and recognize
Advances in Science and Technology Vol. 124
163
characters as well as the colour of the license plates. This will aid in further easy vehicle identification and information. The ALPR system consists of 4 basic modules: license plate detection, image processing, character segmentation and character recognition. Additionally, for colour identification a module of colour recognition is introduced after the license plate detection. Related Works Numerous authors researched and created unique detection and recognition techniques. However, the variety and non-uniform existence of Indian license plates make it difficult for these ALPR systems to perform to their full capacity. Various ALPR systems use different techniques for detection of license plates such as SSD [1] or by setting an area of interest, extraction by bounding box method [1] and similarly for recognition of characters of detected license plates, generally image processing techniques are applied for further accurate recognition followed by character segmentation. The papers [5], [7] have used OCR methods for recognizing Vietnamese and Greek license plates. It is inferred that [7] has used the PNN model but no distorted image could be used. An accuracy of 86% in detection using WPOD-NET has been achieved. In papers [4], [8], two YOLO models were used for detection and recognition. It is observed that YOLO has a better detection accuracy but oblique images couldn’t be used since on-plane rotation couldn’t be defined. The use of WPOD-NET proved to be advantageous because of its work using the most challenging images as in [5] and because of its unique warped planar object detection feature [6]. This feature helps to detect license plates from any angle and process it into a fronto-parallel view. The use of MobileNet proved to be appropriate for recognition because of its speed when compared to YOLOv3 as in [9]. MobileNet proved to be faster by 0.018 seconds. It is also advantageous since it is a low weight model and meets real-time requirements. A few papers like [1], [12] used MobileNet for detection but couldn’t train a lot of images. Detection using WPOD-NET The architecture of WPOD-NET localizes the license plate at any given angle and converts it to a fronto-parallel view making identification possible in any state of the image. This model has been improved in paper [6] but doesn’t work with distorted images and low-resolution images. Unlike in many applications, the license plate is detected without setting any region of interest.
Fig. 1 - Detected license plate In the research made in the paper [1], the use of MobileNet increased the accuracy and detection time since they are capable of detecting license plates. It is also a lightweight CNN model which does not require a high GPU. Many ALPR systems used images captured using a computing device which cannot be afforded by many. This paper works on reading the characters of the LP from a pre-detected image by segmenting characters as done in many papers like [2], [3]. The WPOD-Net model is capable of detecting license plates of ten different countries and even two license plates from a single image. This dataset includes Indian license plates with one car image only. Python libraries are used for applying image processing, multi-dimensional arrays, and visualization of data. The WPOD-Net was developed by utilizing various visions from YOLO, Single Shot Multibox Detector (SSD) and Spatial Transformer Networks (STN). The WPOD-NET was trained with a dataset containing 196 images out of which 105 images from the Cars Dataset, 40 from the SSIG Dataset, and 51 from the AOLP dataset were combined. For every image manual annotations of the 4 edge points of the LP have been performed.
164
IoT, Cloud and Data Science
The architecture description of WPOD-Net is as follows: The architecture has a total of 21 convolutional layers, and 14 residual blocks within it. The size of the convolutional filters is set as a standard size of 3 × 3. Rectified Linear (ReLU) activations are used all over the network, excluding in the detection block. The combination of 4 max-pooling layers of size 2 × 2 and stride 2 reduces the input dimensionality by a factor of 16. The detection block has two parallel convolutional layers for deducing the probability, activated by a SoftMax function, and another for regressing the affine parameters, without activation as in Fig 2.
Fig 2 - WPOD-Net architecture diagram Colour Recognition The concept of recognizing the colours of Indian license plates emerged to recognize the type of registration of vehicle through recognition of the colour of the license plate which can make vehicle identification easier. There are 8 types of coloured plates in India. This paper recognizes 6 coloured plates - white, black, red, yellow, green, and blue. The colour of the license plate is detected by identifying the dominant colour present in the image after license plate detection. Subsequently, the type of vehicle is identified using the identified colour. White represents that the vehicle is privately owned, green is for electric vehicles, black is for rental vehicles, red is for temporarily registered vehicles, yellow represents transport vehicles and blue coloured license plates represent that the vehicle is used by diplomats /consulates or foreign representatives. For this document, we identify the type of license plate by detecting the colour of the license plate using a Python module called colorthief and web colours which are used for selecting the most dominant colour and recognizing the colour of the license plate. Thus, vehicle registration types are recognized from the colour of the license plate detected.
Fig 3 - Colour Recognition of license plate Image Processing Image processing steps are applied in various parts of the program. Techniques like grayscale, Gaussian blur, binary threshold, dilation, and morphology are applied to the detected license plate for a better character segmentation process. For yellow and white coloured license plate images, the image binary is inverted. Certain image processing techniques are ruled out or inverted based on the colour of the license plate. Initially, pre-image processing is also done for better detection of license plates.
Advances in Science and Technology Vol. 124
165
Fig 4 - Image processing techniques Character Segmentation Character segmentation is the separating of each character in the license plate which improves the results of character recognition since each letter is being recognized by the model individually. We train characters to read and understand key characters in the license plates using Python and OpenCV. Each segmented character will be sent into the CNN model for recognition. Image processing techniques are applied to the license plates before character segmentation. After segmenting characters in the license plates, contours are found for each character to identify its coordinates. After finding contours, it is important to sort them in order. After sorting contours, the dimension of boxes to be drawn around each character is determined by taking an average of all character dimensions. Finally, it is sent for recognition for characters to be predicted.
Fig 5 - Segmented License plate Character Recognition In the research made in the paper [1], the use of MobileNets increased the accuracy and detection time since they are capable of detecting license plates. It is also a lightweight CNN model which does not require a high GPU. Many ALPR systems used images captured using a computing device which cannot be afforded by many. This paper works on reading the characters of the LP from a pre-detected image by segmenting characters as done in many papers like [2], [3]. The usage of MobileNet V2 for recognizing license plate characters is advantageous to a large sector of people because of its low GPU consumption and can be run on mobile devices. Depth wise Separable Convolution is introduced in MobileNetsV2 architecture when compared to its previous version. This dramatically reduces the complexity, cost and model size of the network. This feature makes it suitable for operating it using mobiles, or other devices with low computational power which is the prime aim of this paper. In MobileNetV2, a better module is introduced with an inverted residual structure. State-of-the-art performances are achieved for character recognition with MobileNetsV2 as the deep learning model.
166
IoT, Cloud and Data Science
Fig 6 - Architecture diagram of MobileNetsV2 The usage of Mobile Nets V2 for recognizing license plate characters is advantageous to a large sector of people because of its low GPU consumption and can be run on mobile devices. Depth wise Separable Convolution is introduced in MobileNetsV2 architecture when compared to its previous version. This dramatically reduces the complexity, cost and model size of the network. This feature makes it suitable for operating it using mobiles, or other devices with low computational power which is the prime aim of this paper. In MobileNetV2, a better module is introduced with an inverted residual structure. State-of-the-art performances are achieved for character recognition with MobileNetsV2 as the deep learning model. Training has been done for understanding the segmented characters and providing a predicted result using a dataset consisting of 34575 images of characters present in Indian license plates taken from different angles and sizes. This large dataset has been split into a 90% training set and a 10% validation set to avoid overfitting. We configure the input layer of our model to receive input images with the shape of (80,80,3). Thus, we would need to adjust our image to the appropriate size and appropriate channel. The loaded label classes are implemented to convert one-hot encoding labels obtained from the model to digital characters. Each character image in crop_characters is run through a loop that stores all predictions from the model into final_result and plots each image with its corresponding predictions as shown in Fig. 4.
Fig 7 - Plotting character images with their corresponding predictions In an attempt to avoid time-wasting of insufficient training, certain callback functions are implemented to monitor the metric. EarlyStopping is used during training to stop the process if the accuracy drops abruptly from its highest value. Every time the loss value is improved, ModelCheckpoint saves the weight of our model. The model thus acquired 93% accuracy approximately after 5 epochs and at the end of the training process, it achieved 97% accuracy approximately.
Advances in Science and Technology Vol. 124
167
Fig 8 – Recognition of letters after training Experimental Results Here we discuss the training details and the experiment results that validate the effectiveness of our approach. The detection module uses pre-trained weights and extracts the license plate from the manually created dataset consisting of 31 images. WPOD-NET has performed extraordinarily by detecting almost all the license plates with an error percentage of 0.33. All the colours were recognized from all the license plates except a very few badly distorted images which were recognized as yellow instead of white. Character dataset consisting of 34575 images is split into 90% training and 10% validation set. Each symbol (characters and numerical) has been trained with almost 50 images which reflect majorly in the recognition system. A total of 500 images were tested for recognition for which we finally evaluated the accuracy of our approach using a fuzzy stringmatching process which compares the string predicted and the string present in the XML file thus achieving approximately 96 % accuracy. A shortcoming noticed in the recognition system is that a few symbols were wrongly recognized as shown in Table 1. Table 1
Fig. 9 – Output of each module
168
IoT, Cloud and Data Science
Conclusion and Future Works The recognition of Indian license plates and their colour has thus been achieved by the system. The most significant contribution to the work was the introduction of a new concept of recognizing the types of vehicles, based on the colour of the license plate and the usage of a low weight CNN network. The character segmentation process failed to segment characters from license plates due to lighting conditions, the darkness of characters, and the presence of shadows. The recognition network has shown remarkable results by identifying almost all characters and numbers. For future work, we plan to improve the character segmentation results by applying more techniques and identifying the source issue. The misrecognition of letters mentioned in the above table can be rectified by training more images for recognition of the particular letters. We also intend to find the colour of the symbols on the license plate which would help to identify the other two types of Indian License Plates.
Fig.10 – Project flow References [1] “License Plate Identification and Recognition for Non-Helmeted Motorcyclists using Lightweight Convolution Neural Network” 2020, Meghal Darji, Jaivik Dave, Nadim Asif, Chirag Godawat, Vishal Chudasama, Kishor Upla Sardar Vallabhbhai National Institute of Technology (SVNIT), Surat, India, DOI: 10.1109/INCET49848.2020.9154075 [2] “Deep Learning-Based Bangladeshi License Plate Recognition System”, 2019, Md. Mesbah Sarif, Tanmoy Sarkar Pias, Tanjina Helaly, Md. Sohel Rana Tutul, Md. Nymur Rahman Computer Science and Engineering University of Asia PacificI. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350, DOI: 10.1109/ISMSIT50672.2020.9254748 [3] J. -Y. Sung, S. -B. Yu and S. -h. P. Korea, "Real-time Automatic License Plate Recognition System using YOLOv4," 2020 IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia), 2020, pp. 1-3, doi: 10.1109/ICCE-Asia49877.2020.9277050. [4] L. Xie, T. Ahmad, L. Jin, Y. Liu and S. Zhang, "A New CNN-Based Method for MultiDirectional Car License Plate Detection," in IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 2, pp. 507-517, Feb. 2018, doi: 10.1109/TITS.2017.2784093.
Advances in Science and Technology Vol. 124
169
[5] K. Roy et al., "An Analytical Approach for Enhancing the Automatic Detection and Recognition of Skewed Bangla License Plates," 2019 International Conference on Bangla Speech and Language Processing (ICBSLP), 2019, pp. 1-4, doi: 10.1109/ICBSLP47725.2019.201528. [6] S. M. Silva and C. R. Jung, "A Flexible Approach for Automatic License Plate Recognition in Unconstrained Scenarios," in IEEE Transactions on Intelligent Transportation Systems, doi: 10.1109/TITS.2021.3055946. [7] H. Li, P. Wang and C. Shen, "Toward End-to-End Car License Plate Detection and Recognition With Deep Neural Networks," in IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 3, pp. 1126-1136, March 2019, doi: 10.1109/TITS.2018.2847291. [8] S. Montazzolli and C. Jung, "Real-Time Brazilian License Plate Detection and Recognition Using Deep Convolutional Neural Networks," 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 2017, pp. 55-62, doi: 10.1109/SIBGRAPI.2017.14. [9] “An ALPR System-based Deep Networks for the Detection and Recognition”, Mouad Bensouilah, Mohamed Nadjib Zennir and Mokhtar Taffar, Department of Computer Science, Jijel University, BP 98, Ouled Aissa, 18000, Jijel, Algeria, DOI: 10.5220/0010229202040211 [10] “An Approach Combined the Faster RCNN and Mobilenet for Logo Detection”, Terry Mudumb, 2019, DOI: 10.1088/1742-6596/1284/1/012072 [11] P. Ravirathinam and A. Patawari, "Automatic License Plate Recognition for Indian Roads Using Faster-RCNN," 2019 11th International Conference on Advanced Computing (ICoAC), 2019, pp. 275-281, doi: 10.1109/ICoAC48765.2019.246853. [12] Robust Real-time Lightweight Automatic License plate Recognition System for Iranian License Plates, Yusef Alborzi, Talayeh Sarraf Mehraban, Javad Khoramdel, Ali Najafi Ardekany, Mechatronics Lab., K. N. Toosi University of Technology, Tehran, Iran, ICRoM 2019, DOI: 10.1109/ ICRoM48714.2019.9071863
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 170-179 doi:10.4028/p-p00umt © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-10 Accepted: 2022-09-16 Online: 2023-02-27
Proposed Music Mapping Algorithm Based on Human Emotions Harsh Kumar Burnwal1,a, Muskan Mishra1,b and Annapurani Kumarappan1,c* Department of Networking and Communications, SRM Institute of Science and Technology, Kattankulathur, India
1
[email protected], [email protected] , [email protected]
a
Keywords: Music, Songs, Haarcascade classifier.
Facial
recognition,
Recommendation
system,
Mini-Xception,
Abstract. Facial recognition based music system plays an important role in the treatment of human psychology. Face recognition system is an extensively used technique in most of the applications such as security system, video processing, in surveillance system and so on. People are often confused while choosing the kind of music they would want to listen. Relatively, this paper focuses on making an efficient music recommendation system which will recommend a suitable music to make the person feel sooth using Facial Recognition Techniques. This system uses FER-2013 dataset for training of the CNN, which is made using mini-xception architecture. Augmentation techniques are used for increasing the number of images in the dataset for training, which helps to increase the accuracy of the prediction. The face is captured using webcam and facial extraction is done using Haarcascade classifier and then sent to the CNN layers. The mini xception algorithm used in these CNN layers makes the system lighter and efficient as compared to existing systems. The accuracy of the proposed model is calculated and found to have reached the barrier threshold of 95% and average accuracy was found to be 90%. The song is recommended to the user using the proposed mapping algorithm. 1. Introduction Emotion has been described by many but it is something that cannot be easily defined. Whatever a person feels and reflects can be defined as emotion and it can be said as the inner state of a living being. Emotion is very important in real life world whether it be in the fields of health, psychology, or in fields of science. Emotion detection has become an intensive research topic and people from all over the world have been studying various parts of emotions its functionalities and the role it plays. Emotion has been mainly classified into these categories-Happy, Neutral, Angry, Sad, Surprised, Scared, Disgust where various projects have been made on this field of research, be it the facial expression recognition, visual scanning behaviour and many more but the great progress has been made by the introduction of Deep learning in the field of emotion recognition. Facial classification is one such aspect of technology where the purpose of recognizing emotions has been extensively used basically done using the concept of Convolution Neural Network which is used for image segmentation, recognition and classification. Today there is a need to listen music based on our emotions, many applications exists which contains songs based on mood and shows just 3-4 features (Motivational, Devotional, Joy, Party) but none exist which can play songs based on the emotion we feel. Hence to get rid of this problem we have proposed an emotion-based song recommendation system. The music recommendation system based on human emotions consists of modules such as facial detection, extraction of features, emotion recognition and recommender. The dataset used for the proposed system is FER-2013 dataset which consists of 35000 images preexisting with labelled emotions. The face is detected from real time using Haarcascade classifier which is processed; features of the face are extracted for labelling and passed to the layers of Convolution Neural Network where the emotions are detected using mini-xception algorithm based on the concept of Deep Learning. The emotions detected are passed to the recommender which recommends songs to the user based on the detected emotion. Existing work related to the emotionbased recommendation system has complex architecture and consumes time, hence we have
Advances in Science and Technology Vol. 124
171
prepared a model based on a lightweight architecture which can detect face, extract features and classify emotions easily at no time loss. In this paper we will describe the basic architecture of our proposed model, algorithms used for face detection and emotion classification and the recommendation system to provide songs to the user. The first section consists of introduction, the second section will give a brief knowledge of the works already present, the third section will give a brief overview of the architecture used for the working of our proposed system, the fourth section will talk about the implementation of the system in real time and the fifth section talks about the results of our model with various uses and its advantages. 2. Literature Survey S. Gilda [1] mentioned in their paper how to find recommended songs using 3 different modules; the Emotion-module separated the mood, the music separation module and the module that recommends supported music by the similarity of emotions and the type of song which is being recommended. The output is the various types of emotions taking the user’s image as the input. They separate only the basic emotions. The best method used by S. Metilda [2] was using a webcam to capture the user's face expression. User’s feelings are categorized using a fish face algorithm. The song website is used to recommend songs. Only 2 facials were taken- Happy, Sad. So it was understood that recognizing facial expression using a live web cam was a better option but still the computer costs can rise up significantly and it also decreases the accuracy. Raj p [3] removed the appearance and the geometric-based features from the input as these features assisted in sturdy shape and intensity of the facial pixel. They used the algorithm of Vector Support Machine (SVM) and achieved accuracy 85-90% from real-time images and an accuracy of 98-100% from still images. Many studies have been researched in order to understand a person's feelings and their expressions in order to define the person’s characteristics. So, finding the most accurate model is necessary and better accuracy for the model can also be achieved by using built-in libraries. Deny John Samuel [4], used both OpenCV and Support Vector Machine (SVM) in his model. OpenCV helped in feature extraction from the image and SVM helped to predict the known emotions. The paper identifies emotions of 4 types that were: Happiness, Surprise, Anger, and Neutrality. AS Bhat [5] proposed an automated method of identifying mood and music tone by analyzing the visual and harmonious aspects of musical notes and human emotions which divided the music according to their feelings in accordance with the Thayer's model, which describes different features of music such as beats, spectra, dusk. The songs were sorted according to the different features and displayed a performance of 94.4%. Renuka R Londhe [6] made a paper on emotion separation from its facial expressions by studying changes in facial curvature and the difference in intensity from images that were captured of different pixels. Artificial Neural Networks (ANN) was used to distinguish various emotions and came up with various ideas for creating a playlist based on the distinguished emotion. Zeng [7] proposed new facial feature extensions in his paper which consisted of 1) visual-based feature extraction and 2) geo-metric-based rendering, Visual-based extraction removed the important features of a face (eyes, mouth, eyebrows) and geo-metric-based-rendering integrated geometric division based on surface geometric features. Zhang [8] proposed a modified version of the OpenCV, based on the algorithm of AdaBoost. He also used two different methods to get face recognition. The result showed that the two-wire method that was used was simple and fast. Parul Tambe [9] proposed a system where the user automatically interacts with the music player and their model will then initiate and learn the user's preferences and different user functions and recommendation of music is done basing on the user’s mood. They also captured different facial expression from the user to determine the user's feelings.
172
IoT, Cloud and Data Science
Kanwal Garg [10] made an algorithm that provided music from the playlist of the user according to the user’s experience. The designed algorithm calculated using a limited amount of time and hence the costs for using the hardware were reduced. The idea implemented divided the emotions into five categories namely, Happiness, Sadness, Surprise, Anger and Fear and provide the best and most effective way to compliment music. Aastha Joshi [11] determined the attitude of the user through the use of facial features as people regularly explicit their feeling by using expressions. Music based on the emotion of the user reduced the time consumption of the consumer. Typically, people kept high range of music on their music playlist and randomly playing different music did not fulfil the temper of the consumer. This device enabled a person to play music automatically in step with their temper. The image of the person is caught using the internet digicam and stored. The captured pictures are then converted from RGB form of layout to a binary form of layout. This method of representation of information is known as Characteristic-Point Detection method and this technique can be accomplished by making use of the algorithm of HaarCascade supplied by Open CV. The track of the participant is then evolved java software in which the database is managed and the tune is played in according to the temper of the user. Javier G. R´azuri et.al [12] made an android app which automatically displayed songs customized just for that consumer. The model used the photo processing method for analysation of the emotion and then the song is recommended to the user according to his mood. This method was improved and advanced using Eclipse and OpenCV which were used for implementing facial reputation algorithms. The paper confirmed assessments of various algorithms of facial detection. In this paper, the image was captured by the user using a cell phone camera and the emotions were detected. A. Habibzad [13] covered three tiers in his proposed algorithm to recognize the facial emotion- 1) Pre-processing 2) Feature extraction 3) Classification which described diverse stages in photograph processing. 3.
Proposed Algorithm
The first process in our system is the detection of the face expression for emotion classification. There are a lot of algorithms for this such as Histogram of Oriented Gradients (HoG) method which is a widely used process for face detection as well as object detection. This algorithm is a very powerful feature descriptor which uses linear SVM Machine learning algorithm along with it. The second-best algorithm is HaarCascade classifier. This process which was first invented described the method to detect object in real time using Haarcascade classifier. We implemented the similar process in our work so as to detect the faces in real time along with the faces at different angles. There are more algorithms for face detection but this process can be easily applied. This process captures the image by frames and then they are sent to the Haarcascade classifier algorithm, converted to grayscale, cloned for drawing and then the ROI of face is captured and sent to the CNN model for image classification. Face can also be detected from a video or an image which is processed with the same method. The next process is the image classification which means after the face is detected, it has to be passed through CNN layers for describing the type of emotion. For this the best architecture to detect emotion is mini-Xception. This architecture is a very deep network model of Convolution Neural Network. We trained the model using datasets containing images of Neutral, Happy, Sad, Surprised, Scared, Angry, Disgust emotions. In order to classify images into certain emotions, CNN can automatically learn that means it learns the features by itself. The CNN contains various layers which train and test the datasets of images and then use it for generating emotions. The dataset is classified into 2 groups before entering the CNN- 80% of the datasets are used for training and 20% are used for testing. Then testing is performed in the model after the images are trained using CNN. During testing, the accuracy is calculated by checking if the facial image is classified accurately or not. Accuracy of the image classification can be increased either by increasing the number of epochs in the testing time or the number of images in the dataset; to increase the number of images
Advances in Science and Technology Vol. 124
173
of the dataset, augmentation techniques are used. Our project makes use of this process and the model is trained using FER-2013[14] dataset which contains 35000 images of different emotions. Training the model takes lot of time. After that the face detected from the live cam feed is processed in the CNN layers and the emotion is classified. After the classification of emotion is done, the next process is recommending songs through the proposed mapping algorithm which automatically takes the user to a playlist of songs based on their emotion. This process makes use of basic lines of coding to link the model of emotion detection with the link of YouTube video.
Figure 1. Sequence Diagram of music recommendation System
The fig.1 shows the sequence diagram for our whole model where it can be seen that the dataset is being taken from the user and provided for preprocessing and the preprocessed dataset is returned back to the user for training the CNN model to detect emotions, the next process is the face detection of the user from the webcam which is processed by the CNN model to depict the emotion and its accuracy and is shown to the user on the screen, the predicted emotion is then transferred to the proposed mapping algorithm where it can open the youtube playlist [15] as per the detected emotion. The mapping algorithm uses the technique of nested if else to determine the emotion of the user and a certain count of 50 has been fixed for detection of the emotion, that means the emotion which is detected continuously for 50 times will open the link of the youtube provided. The emotion will continue to display until and unless a fixed emotion is detected continuously and the same is displayed on a screen. The face images are captured and processed by the Haarcascade method and sent to the minixception model where it classifies the emotion and the accuracy which is then shown to us. Fig.2a shows the emotion detected as neutral and its accuracy as 34.4 % which means the neutral face has 34.4% prediction of the emotion. fig.2b shows an angry face with 37.45% of accuracy and the fig.2c shows a happy face with 71.91% accuracy.
Figure 2a. Neutral Face
Figure 2b. Angry Face
174
IoT, Cloud and Data Science
Figure 2c. Happy Face
In the architecture which is used for training the system, images are taken from FER-2013 dataset which is then preprocessed and split into two-80% for training and 20% for testing, training sets are divided into two parts x and y for the convenience of passing through the multiple layers of CNN. This split data set is then passed into the CNN algorithm of mini-xception as it can be seen in fig.3.
Figure 3. System Architecture for Training
4. Implementation For implementation first, CNN model is trained for emotion classification. For this the train_emotion_classifier.py is created and the library files are imported. Then the required parameters are used which contains batch size 32 for 10000 epochs and percentage of images required for training and testing. Augmentation techniques are added so that the images in the dataset are increased which in turn increase the accuracy of emotion classification. The augmentation techniques used are rotation range, wide shift range, height shift range, horizontal flip and zoom range. After this mini-xception model of CNN is implemented where optimizer, loss and metrics are used for compiling the input data. We use the call model.summary() for displaying the full architecture of our CNN model with layers and parameters which we have created using cnn.py. Then we made a structure of callback where we do a function call every time an epoch finishes. Basically, it contains the parameters from which we are logging the files of validation loss and accuracy into a log file. This is used for training purpose and to get the validation accuracy of our model. The next is writing of the function which will load the dataset from the file load_and_process.py. Here the FER-2013 dataset is loaded and the images in it are divided into two parts- 80% for training and 20% for testing. After the training of our model is done, we start by detecting the face for emotion classification. For this we took the help of haarcascade_frontalface_default.xml which detects real time picture of user in front of the camera. But for just detecting the face and removing the unnecessary parts we use classifier.hdf5 file which captures the face in the image and removes other parts. We have classified the emotions as 7 different types-Happy, Neutral, Angry, Sad, Surprised, Scared and Disgust. We have also implemented a code where the face can be detected from uploaded video file. This process tries to grab the frame of a certain time when the emotion is expressed. Then the frame is resized to the desired format and then converted to grayscale as the model understands just black
Advances in Science and Technology Vol. 124
175
and white pixels. The image is then cloned so that drawing on it is possible. Then the largest face area is determined and face ROI is extracted from the image and then preprocessed for the network. A prediction is made on the ROI and the emotion label is depicted along with the probability of that emotion. After this process, emotion along with its high probability from 100% is shown on the screen on top of the rectangle which covers the face. For real time face detection, the image is captured pixel by pixel by the camera and then passed to the Haarcascade classifier which captures just the ROI and passes it to CNN model for classifying the emotion. The accuracy of the classified emotion is shown on the screen. The file application_sound.py shows the code which calculates the number of frames for which the emotion is displayed and then calls the link for the YouTube video if it exceeds the preset count. The emotion which continues for the preset count of frames opens a dialogue box displaying-"Playing (whatever emotion) songs" and the YouTube video link get opened. The implementation has been shown in fig.4 where the anaconda command prompt is run as admin and following steps has been implemented to open the webcam for capturing the face in real time.
Figure 4. Implementation and Running of our model
Fig.5 shows the image classification and in this case the emotion is happy with the accuracy of 73.51%. The classified emotion is then shown to the user with a rectangle formed on the face along with the predicted accuracy of the predicted emotion.
Figure 5. Image Classification with Accuracy Percentage
After the emotion is detected, the screen displays a phrase as shown in fig.6 after 50 counts of the same emotion is detected by the system and then the song is recommended by the recommender. The phrase name depends on the type of emotion detected, it can be happy, sad, angry, surprised, neutral, disgust or scared and it will print as Playing “Detected Emotion” Songs.
176
IoT, Cloud and Data Science
Figure 6. Display of phrase Playing Happy Songs
The fig.7 shows the detection of the emotion by our model when the input is the recording of a video and the same process is followed displaying the predicted emotion and the corresponding accuracy, till the same emotion is detected continuously for 50 counts the emotion label keeps displaying and stops when it reaches 50 and then the phrase opens showing – Playing
Figure 7. Emotion Classification from recorded Video
“Emotion detected” Songs and then the YouTube link opens corresponding to the detected emotion and plays the song as it can be seen from fig.8.
Figure 8. Song link from Youtube
5. Results and Discussion The trained model is able to detect faces and label the emotions with less time constraint and in a very smooth manner; the advantage of our model is that the mini-xception model takes 24 frames per second from the real time cam feed as compared to the other complex models which take 2 secs for 1 image to be processed. The model has been vigorously trained to classify the emotions in a better way and with much improved accuracy; the accuracy is calculated using the categorical cross entropy as loss as well as for the metrics of accuracy. Through which we have achieved the highest barrier of 95% and getting an average accuracy of 90% as it can be seen in the fig.9. The loss is
Advances in Science and Technology Vol. 124
177
significantly decreasing every epoch which means the greater number of datasets we provide; the model gets trained rigorously which improves its ability to depict high accuracy. We have used the FER-2013 dataset for training the model and for increasing the dataset we have used data augmentation techniques as already told above. The fig.9 shows the graphs for the loss and accuracy of our model. There are two lines in the graphs- one is for the training loss/accuracy and other is the validation loss/accuracy. The first graph depicts the loss v epoch’s graph which shows the loss (a number which indicates how bad the prediction is) of our model and it can be seen that it has been significantly decreasing with each epoch. The training loss means the loss that we are getting during the training of the model and the validation loss means the loss which we are getting during testing (data split in both training and testing). The second graph depicts the accuracy vs epochs graph which show the accuracy (the fraction of predictions which got right) of our model and it can be seen that it is increasing every epoch. The training accuracy shows the accuracy which is being reached during the training of the model (using FER dataset) and the validation accuracy mean the accuracy which is depicted during testing. In this graph we can see that we are getting an average accuracy of 80% for just 200 epochs and we have already breached the validation accuracy threshold of 95%, which means the more data the model receives the more accuracy it will be going to achieve. As it will take a lot of time to calculate the loss and the accuracy, we have just stopped our model till 200 epochs and depicted the following graph.
Figure 9. Loss and Accuracy graphs for every epoch
6. Conclusion Emotion recognition using facial expressions has been a widely researched topic which has gathered the interest of many researchers and scholars due to its various applications in real world. Day by day, the problem of recognition of emotions has been increasing and as a result various algorithms are being introduced to curb this problem. Medical science and human science are a category where there is a great importance of these emotion recognition algorithms and new ways are being developed by researchers to extract the emotion and use it for the treatment of the user. The proposed system is able to capture the emotion of a user successfully and this has been tested in real-time environment. However, for determining the robustness of our system, various lighting conditions has to be used and our system has well performed while capturing the images of the user and classify its emotion and its accuracy and it has also been able to capture new images of user for updating its classifier and the used training dataset. The results show that our model can perform better for emotion recognition with some more adjustments and fine-tuning. Our model has been trained using the FER Dataset and using this technique, the emotion of person is achieved with perfect tradeoffs between speed and accuracy over 95%. The system was designed using mini xception architecture which makes the system
178
IoT, Cloud and Data Science
lighter and hence more efficient and, a new mapping algorithm was written which automatically linked songs from the YouTube to the CNN layers as per the emotions detected. There are some limitations to our current model which hinders the high accuracy and predictability of the working, the user has to click on only the YouTube links which has been provided to access music videos. The quality of the camera should have good resolution of at least 360p in order to get good accuracy of the face recognition system and the atmosphere should be well-lit for taking the image and it would also help the classifier to detect accurately and precisely. There is scope of future works in our system such as the recommender system in our model can be improved and in place of that more advanced API technology can be implemented to automatically provide music to the user based on the detected emotion. References [1]
S. Gilda, Shlok, "Smart music player integrating facial emotion recognition and music mood recommendation." 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE, 2017.
[2]
Florence, S. Metilda, and M. Uma. "Emotional Detection and Music Recommendation System based on User Facial Expression." IOP Conference Series: Materials Science and Engineering. Vol. 912. No. 6. IOP Publishing, 2020.
[3]
Vinay p, Raj p, Bhargav S.K., et al. “Facial Expression Based Music Recommendation System”, International Journal of Advanced Research in Computer and Communication Engineering, IJARCCE.2021.10682, 2021
[4]
Samuvel, D. J., Perumal, B., & Elangovan, M. (2020). Music recommendation system based on facial emotion recognition. 3C Tecnología. Glosas de innovación aplicadas a la pyme. Special edition 261-271, March 2020.
[5]
S. Bhat, V. S. Amith, N. S. Prasad and D. M. Mohan, "An Efficient Classifica-tion Algorithm for Music Mood Detection in Western and Hindi Music Using Audio Feature Extraction," 2014 Fifth International Conference on Signal and Image Processing, pp. 359-364, 2014.
[6]
Londhe RR and Pawar DV 2012 Analysis of facial expression and recognition based on statistical approach, International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume.2 Issue-2, May 2012.
[7]
Z. Zeng, M. Pantic, G. I. Roisman and T. S. Huang, "A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume. 31, no. 1, pp. 39-58, Jan. 2009.
[8]
Xianghua Fan, Fuyou Zhang, Haixia Wang and Xiao Lu, "The system of face detection based on OpenCV," 24th Chinese Control and Decision Conference (CCDC), pp. 648-651, 2012.
[9]
Parul Tambe, Yash Bagadia, Taher Khalil and Noor UlAin Shaikh, Advanced Music Player with Integrated Face Recognition Mechanism, IJIRT, Volume 3, Issue 1, 2015.
[10] Jyoti Rani, Kanwal Garg.,” Emotion detection Using Facial expressions-A review”, International Journal of Advance Research in Computer Science and Software Engineering, Volume 4, Issue 4, 2014. [11] Aastha Joshi, Rajneet Kaur, A Study of speech emotion recognition methods, IJCSMC, Vol. 2, Issue. 4, pp.28 – 31, April 2013. [12] Javier G. R´azuri, David Sundgren, Rahim Rahmani, Aron Larsson, Antonio Moran Cardenas and Isis Bonet, “Speech emotion recognition in emotional feedback for Human-Robot Interaction” International Journal of Advanced Research in Artificial Intelligence(IJARAI), 4(2), 2015.
Advances in Science and Technology Vol. 124
179
[13] Habibzad, Ninavin and Mir kamal Mirnia A new algorithm to classify face emotions through eye and lip feature by using particle swarm optimization, 4th International Conference on Computer Modeling and Simulation, IPCSIT Vol.22, 2012. [14] Syed Aley Fatima, Ashwani Kumar, Syed Saba Raoof. Real Time Emotion Detection of Humans Using Mini-Xception Algorithm, IOP Conference Series: Materials Science and Engineering, Volume 1042, 2nd International Conference on Machine Learning, Security and Cloud Computing (ICMLSC 2020), December 2020. [15] Pooja Mishra, Himanshu Talele, Rohit Vidhate, Ganesh Naikare, Yogesh Sawarkar. Music tune generation based on facial emotion, International Journal of Engineering Applied Sciences and Technology, Vol. 5, Issue 4, pp. 616- 621, August 2020.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 180-186 doi:10.4028/p-ejo102 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-12 Accepted: 2022-09-16 Online: 2023-02-27
Detection of Bike Riders with no Helmet Sai Aditya G.1,a, Swetha B.2,b, Meenakshi K.3,c Hariharan C.4,d 1-4
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India
[email protected], [email protected], [email protected], [email protected]
a
Keywords. OpenCV (CV2), You Only Look Once (YOLOV3), Convolutional Neural Network (CNN), Automatic License Plate Recognition (ALPR), E- Mail (Electronic Mail)
Abstract. Machine literacy refers to the creation of digital mechanisms which could make significant contributions without formal training by interpreting as well as extrapolating from attack patterns using mathematical techniques from various models. It's research wherein the system can improve itself autonomously as a consequence of the interaction and data. It's an example of artificial intelligence in action. In this study, we'll look at recognizing bike riders wearing or not wearing helmets in a video. To get the movies back into circulation, we'll have used the OpenCV Software. The YOLO and CNN models, that are utilized for enabling real-time license plate retrieval for a non-helmeted rider, the above two models are indeed the prescribed strategy for our device. The technology in our approach looks to see if the user who is riding the bike is wearing a helmet. If the rider is not wearing the required protective gear, the license plate is nonetheless extracted so that it could be supplied to activity recognition technology for evaluating and giving the fines to the prosecutor. For this design, we have a truly vast compass. It can be used to detect bikers who are not wearing a helmet. If the individual in the video is not wearing a helmet, the system is designed to detect the license plate of the bike and send an appropriate alert message to the appropriate person. This system is found to be more robust and effective than other algorithms. I. Introduction During the past few years, there seems to be a surge in the percentage of motor vehicles involved in issues such as commercial traffic and accidents that can result in death or injury. It has been emphasized that increasing key transportation structural systems, such as reducing the number of highways or building flyovers, will not completely solve these issues. Vehicles in surveillance videotape cameras are the first step in commercial surveillance. Helmets are one of the most critical safety features for anyone riding a motorbike in the modern era. Unfortunately, the use of helmets has not increased as much as it should have, especially in areas where wearing a helmet is mandatory. The major goal of this technique is to give a system a way to check whether a user is wearing a helmet or not, as well as to independently check their license plates. Within the world's one percent of motor vehicles, our country accounts for fifteen percent of all business deaths. The motivation for creating this design is to assist our society's business officers in determining the number of vehicles travelling through a trace in a given amount of time. This will assist our citizens in adhering to India's traffic laws. Due to the widespread use of company security cameras, a large library of business videotape footage has been accumulated for examination. II.
Literature Survey
[1] Expostulate discovery and multiple object tracking are used in this work to count and classify automobiles from videotape captured by a single camera at each station. They proffered a depth perception channel with configurations for detection and recognition by line of traffic, throttle position estimation, vehicle shaft and curmudgeon counting from multiple viewpoints of kind that they evaluated on surveillance videos captured at various venues with variety of business inflow environments and evaluated it by comparing to information recorded by piezoelectric detectors. Users implemented an algorithm in this study that classifies automobiles by combining a learning
Advances in Science and Technology Vol. 124
181
architecture (CNN) clustering algorithm with spatial figure knowledge. In this paper, we demonstrate how to make good the use of certain old tools by employing video processing to take out important customer information from films shot with mainframe dslrs. In our proposed visiongrounded channel, we use current world instrument transmitters and switch to detect automobiles riders', and cyclists in static images. We present a new image-to-world homography function by calibrating the camera. It enables our rangefinder system to count two - wheelers by roundabout and estimate vehicle duration as well as tempo in everyday segments. Our channel furthermore contains a vehicle classification package that combines a CNN heuristic with nonlinear figure and personal details. We put it to the test on recordings taken across many venues with varying business inflow conditions, comparing the findings to data collected by piezoelectric detectors. According to the results of our experimental testing's, the proposed channel can retrieve 60 times a second for pre- recorded videos and yield higher metadata for further business analysis. From videotape collected by a single camera at each point, the channel uses object detection and multiple objects tracking to count and classify automobiles. Rather than adaptively modelling the background, we use state-of-the-art object sensors to ensure that module detection is resistant to changes in illumination and the influence of fog. [2] Automotive recognition and prevention have become a platform for the development of image analysis and intelligent systems that has received a lot of attention due to the growing number of cars, traffic rule violators, and accidents. This study will look at several approaches and why they've morphed in recent years to produce improved outcome, with more than just an emphasis on machine learning. Like a side effect, the subject description has been updated to include helmet detection. The purpose of this research should be to discover utmost intriguing way to disincentivize travellers who violate driving rules for not following safety procedures, as well as to assist interactively operating subordinates. Historically, it was then accomplished through laborious and time-consuming procedures such as background subtraction, manual feature extraction, classification, and alternative image processing techniques. Because of the limited endurance for difficulties like that as turbidity, luminance, and weather changes specialized approaches were required. Optical character recognition of range plates is used in an automatic range plate recognition technique. The input dataset for the system is a portrait acquired on a camera area unit. [3] This paper analyses vehicle detection and pursuit issues beneath adverse climate by introducing a visibility restoration technique to enhance pictures quality. They’ve introduced a YOLO rule designed for quick object notice ion and to detect multiple object classification in one go and to atone for lost tracks caused by incomprehensible detection, the price matrix of every section is resolved victimization through HUNGARIAN rule. A new benchmark dataset called DAWN (Detection in Adverse Weather Nature) is introduced for unambiguous study of performance in bad weather. It consists of real-world images taken in a variety of adverse climates. The results have valid effectiveness of the planned technique that outperforms progressive vehicle detection. Inclement weather, such as hail, mist, rain, gravel, or debris, impair camera performance by curbing visibility, jeopardizing driver safety. These limitations do have an impact on the results of identification and tracking methodologies used in public inquiry technologies as well as driver assistance solutions. To begin, we'll propose a visibility sweetening theme that consists of three stages: illumination sweetening, reflection part sweetening, and linear weighted fusion to improve performance. Then, using a multi-scale deep convolution neural network, we develop a reliable vehicle detection and pursuit strategy. The mixture likelihood hypothesis density filter-based huntsman is combined with gradable information associations (HDA), which are divided into detection-to-track and track-to-track connections. To compensate for the lost tracks produced by unintelligible detection, the price matrix of each segment is resolved using the Hungarian rule. For quick execution, HDA uses only detection data rather than visual options data. We've also released DAWN, a unique benchmarking dataset targeted for use in autonomous vehicle applications in harsh weather.
182
IoT, Cloud and Data Science
III. Existing System The authors devised and validated a machine-controlled technique for recognizing bike riders who do not wear a helmet in a previous system. The technology makes use of a classification model that was created based on visual knowledge in the region of riders' heads. The alternatives were chosen by hand to mimic the shape and reflecting quality of helmets with a brighter top half than a darker bottom half. It moreover helps determine the helmet's elliptical form. From Hough remodel support, the model employs a radial detection mechanism. The most significant flaw with this approach is that it leads to a lot of misclassifications, since some objects that resemble helmets are categorized as helmets, while others that are completely different are not. Another flaw is that it does not first identify motorcyclists within the frames, which should be done because helmets are only significant in the case of motorcyclists IV. Proposed System The purpose of constructing this design is to form a system which is suitable for traffic police officers to notice whether or not someone is wearing a helmet or not by checking the enrolment number plate of the vehicle. The most important aspect of this design is to enhance the delicacy of the prognostications the system makes whereas discovery the helmets. Our initial way is to obtain a portrait of the bike and afterwards look for the headgear. A further step is to pre-process the image that has been captured. The inclusion of a helmet is noticed for classed motorcyclists. If indeed the helmet is really not identified, the vehicle's license plate is extracted. The license plate discovery, on the other hand, has so far been conducted for all the biker's portrait, which would have been encouraged. If the classifier brackets the image and determines that the zenith area may be a helmet, the license plate discovery is skipped and the image is discarded. The license plate discovery system includes changing the image into grayscale, thresholding, removing silhouettes, filtering so as to apply helmet discovery and license plate recognition. There are five objects that must be propelled in order to starts getting assessed. The celestial elements are a helmet, no protective eyewear, a motorcycle, a person (sitting on the bike), and a registration number plate. The YOLO and CNN Model area unit are used for period of time discovery of the license plate for a non-helmeted person within the system. The videotape frame is taken as the input and also the anticipated affair is that the localized enrolment number plate for a non-helmet user. In our approach, the system checks the presence of a helmet on the traveller. However, the license plate is an uprooted associate degreed as it shoots an alert packaging to the corresponding person through e-mail if the user is not wearing a helmet. V. Comparison of Existing System with Proposed System Predominantly, traffic cops physically check as to if riders are wearing helmets or not. Human verification is imprecise, time- consuming, and subject to individual error limitations. Security system techniques in major semi-urban and remote regions are not computerized and involve productive labor. The increasedprevalence of two wheelers on the roads as well as the frequency of injuries sustained by riders not wearing helmets has spurred more research into pedestrian safety. The expanding use of motorbikes in today's society has resulted in an increase in traffic fatalities and incidents. One of the leading causes is the bike rider's neglect to wear a helmet. The suggested framework uses a digitalized machine used to discern among photographs of a motorbike rider wearing or not wearing a helmet. This process will be executed with deep learning modules such as neural networks. Finally, an alert notification will be sent to the user who is not wearing a helmet. VI. System Architecture System network architecture for detection of bike riders with no helmet is illustrated in Fig.1.
Advances in Science and Technology Vol. 124
183
Fig 1: Architecture for Helmet Detection A. Input Frame: The input video is taken from the real time traffic surveillance cameras and This can be used to acquire moving objects for feature extraction. B. Bike and License Plate Detection: The YOLOV3 model has been taught to detect bikes and registration codes. YOLO could be a timebased, progressive object detection system. It's lightning wise precise and efficient, and it could be a significant improvement over previous YOLO editions. It also makes prognostications using an unique network analysis, as opposed to other systems, such as R-CNN, that further necessitate it to be rather rapid for a single image. This makes it a thousand times faster - CNN. C. Helmet Detection: The CNN model is trained to notice helmets. The weights generated when coaching area unit accustomed load, the model. From this we tend to get the knowledge about the user who is on the motorcycle. If the rider isn't using any form of headgear, then we will simply extract the license plate data of the user. VII. • • • • •
Modules Importing Required Libraries Load the model Read the Input Predicted Boxes E-Mail Alert System
Importing required libraries: The required libraries are: OpenCV is a huge ASCII text file library that may be used for laptop vision, machine learning, and image processing. It will analyse images and videos for items, faces, and maybe someone's handwriting.
•
NumPy has the potential to be a one-stop shop for array processing. It includes a superior fourdimensional array object as well as capabilities for working with them. It's the most fundamental Software framework for numerical computation.
•
• The OS module provides a transportable means of mistreatment operational system-dependent practicality.
184
IoT, Cloud and Data Science
• Keras could be a neural network Application Programming Interface (API) for Python that's tightly integrated with TensorFlow, that is employed to make machine learning models. Keras' models provide a straightforward, easy thanks to outline a neural network, which is able to then be designed for you by TensorFlow.
Load the model: We split the network into two parts: • yolov3.weights and yolov3.cfg (pre-trained weights and configuration file): The weights and files indicated above is being used to validate the algorithm to recognize the recipient's bike and license plate. Our model's DNN backend is set to the OpenCV library, which is utilized to target the system to the CPU. The model then tries to run it on a GPU by changing the target to cv.dnn.DNN TARGET OPENCL. • helmet-non helmet cnn.h5: The CNN model that we trained to detect helmets is loaded in the second portion.
Fig 2: Detection of Bike riders with Helmet Read the input: We browse the video stream. Additionally, we have a tendency to conjointly open the video and detect the output bounding boxes. Video capture object to browse frames from the video feed. We can then facilitate the output frames to a video file. This performance is supplied with a picture and therefore the model can predict if there is a helmet or not within the image. We can then get the output layers from the additional steps. We then browse the frames from the video file, size them and obtain their height and breadth Predicted boxes: Use cv2.dnn.blobFromImage() to form a blob from the image and to use this blob as input to our network. Set this blob as input Flow this blob through the network and acquire the outputs. Get the outputs from the network. Get the indexes of the boxes that we'll not contemplate thanks to NonMaximum Suppression. Draw a green box around bikes and a red box around the license plates and it will notice if there's a helmet. or not.
Fig 3: Detection of Bike Rider with No Helmet
Advances in Science and Technology Vol. 124
185
E-mail alert system: We will set an email Id for the sender and receiver. After we set the respective Id’s, we will give an alert to the user who is not wearing a helmet via the e-mail. The e-mail will be sent to the traffic police officials who in turn will give the user an alert message telling him that he is not wearing a helmet and fine the helmet defaulters. This alert system is created with the help of SMTP and SSL.
Fig 4: E-Mail alert message with number plate image
Fig 5: Source Code for E-Mail Alert System IX. Conclusion Points The YOLOV3 model has a similar temperament to the CNN model for detecting bikes and license plates, as evidenced by the data displayed above. The predicted end-to-end model was successfully built, and it now has all of the necessary machine- driven and compliance-ready features. Some systems for extracting license plates take into account completely different conditions, such as many riders without helmets, and are designed to handle the vast majority of cases. Our project's libraries and packages are all open source, making them incredibly customizable and cost-effective. The project was designed primarily to follow traffic regulations. The users captured within the video not wearing a helmet are detected and therefore the registration code of the actual vehicle is recognized. It'll then send an alert notification to the corresponding person. Thus, at the top of it we are able to say that if deployed at Department of Transport, it might create their job easier and a lot of economical balance. References [1] Chenghuan Liu, Du Q. Huynh, Senior Member, IEEE, Yuchao Sun, Mark Reynolds, Member, IEEE, and Steve Atkinson "A Vision-Based Pipeline for Vehicle Counting, Speed Estimation, and Classification" [2] S. Sanjana1, S. Sanjana1, V.R. Shriya1, Gururaj Vaishnavi1, K. Ashwini1 "A review on various methodologies used for vehicle classification, helmet detection and number plate recognition" [3] M. Hassaballah, Member, IEEE, Mourad A. Kenk, Khan Muhammad, Member, IEEE, and Shervin Minaee "Vehicle Detection and Tracking in Adverse Weather Using a Deep Learning Framework" [4] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, “Fast R-CNN” (Submitted on 4 Jun 2015 (v1), last revised 6 Jan 2016 (this version, v3)).
186
IoT, Cloud and Data Science
[5] Joseph Redmon, Ali Farhadi, “YOLO9000: Better, Faster, Stronger”, University of Washington, Allen Institute of AI. [6] Joseph Redmon, Ali Farhadi, “YOLOv3: An Incremental Improvement”, University of Washington, Allen Institute of AI. [7] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng – Yang Fu, Alexander C. Berg, “SSD: Single Shot MultiBox Detector”. [8] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Robust real-time unusual event detection using multiple fixed location monitors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 555–560 [9] S. Santra, S. Roy, P. Sardar and A. Deyasi, "Real-Time Vehicle Detection from Captured Images," 2019 International Conference on Opto-Electronics and Applied Optics (Optronix), Kolkata, India, 2019, pp. 1-4, doi: 10.1109/OPTRONIX.2019.8862323. [10] P.M. Daigavane, P.R. Bajaj. “Real Time Vehicle detection and Counting Method for Unsupervised Traffic Video on Highways,” IJCSNS International Journal of Computer Science and Network Security, vol.10, (2010). [11] B. F. Momin and T. M. Mujawar, “Vehicle detection and attribute-based search of vehicles in video surveillance system,” IEEE Int.Conf. Circuit, Power Comput. Technol. ICCPCT 2015, pp. 1– 4, 2015, doi: 10.1109/ICCPCT.2015.7159405. [12] G. Rocha Filho et al., "Enhancing intelligence in traffic management systems to aid in vehicle traffic congestion problems in smart cities", Ad Hoc Networks, vol. 107, p. 102265, 2020. doi: 10.1016/j.adhoc.2020.102265 [13] O. Sharma, "Deep Challenges Associated with Deep Learning," 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 2019, pp. 72-75, doi:10.1109/COMITCon.2019.8862453. [14] Liyuan Li, Weimin Huang, Irene Yu-Hua Gu and Qi Tian, "Statistical modeling of complex backgrounds for foreground object detection," in IEEE Transactions on Image Processing, vol. 13, no. 11, pp. 1459-1472, Nov. 2004, doi: 10.1109/TIP.2004.836169. [15] R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1440-1448, doi:10.1109/ICCV.2015.169. [16] Y. Bar-Shalom and T.E. Fortmann, Tracking and Data Association. London, U.K.: Academic, 1988. [17] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no.6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031. [18] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, RealTime Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91. [19] S. S. Blackman, “Multiple hypothesis tracking for multiple target tracking,” IEEE Aerosp. Electron. Syst. Mag., vol. 19, no. 1, pp. 5–18, Jan. 2004. [20] C. Huang, B. Wu, and R. Nevatia, “Robust object tracking by hierarchical association of detection responses,” in Proc. ECCV, 2008, pp. 788–801.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 187-196 doi:10.4028/p-eo431o © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-12 Accepted: 2022-09-16 Online: 2023-02-27
Classification of Retinal Images Using Self-Created Penta-Convolutional Neural Network Narain Ramaswamy S2,a, Siddhant R1,b, Barath Vimanthann S.3,c, Anubha Pearline S4,d , Muthurasu N5,e Department`of Computer`Science`and`Engineering`, SRM Institute of Science and Technology Vadapalani`Campus`, No.1, Jawaharlal Nehru Road, Vadapalani TN, India [email protected], [email protected], [email protected], d [email protected], [email protected]
a
Keywords: Retinal image classification, Medical image classification, Deep learning, Convolutional Neural Network, hyperparameter tuning.
Abstract.The primary way to classify retinal illnesses is to conduct several medical examinations, the most important of which is a visual examination. Human error is common as a result of a poorhigher cognitive process, which is one of the major challenges in visual disease diagnosis. Automated image processing technologies are more useful for early disease diagnosis and evaluation than the digitized diagnostic imaging conventional operations are confusing and time-consuming. The aim of this paper is to create a system that detects retinal abnormalities based on images using Deep learning technique. The images are first pre-processed. The photographs are enhanced after they have been pre-processed. The images that have been pre-processed are fed into the Penta-Convolutional Neural Network (Penta-CNN). Penta-CNN is a five-layered architecture that includes two convolutions, max pooling, and three fully connected layers. The performance of Penta-CNN is evaluated using STARE(Structured Analysis of the Retina) database[14]. The model is also trained with several hyperparameters which are tweaked and assessed. I. Introduction The Retina plays a crucial part in color and picture recognition. Retinal Tear, Macular Hole, Diabetic retinopathy, etc. are some of the ailments that are caused here. The precise and automated analysis of retinal pictures has long been thought to be an efficient method for diagnosing retinal disorders. On the inner side of the eyeball, the retina is a photosensitive layer in the optic tract tissue lining. A multitude of illnesses in this region can cause irreversible vision loss by causing damage to the retina. Retinal diseases can manifest in a variety of ways, but the majority of them cause vision problems. The retina contains a large number of light-sensitive cells as well as other nerve cells that receive and organize visual information. The nerves transmit the information perceived to the brain and allows us to view the object. Many previous studies have concentrated on using deep learning strategies for automating the diagnosis of retinal diseases, resulting in the analysis of an extremely high number of fundus photographs obtained from retinal screening programmes. Within the field of machine learning, deep learning for interpreting medical images has emerged. Nonetheless, research on multicategorical categorization for diagnosing ocular illnesses is scarce. The objective of this dissertation is to use publicly available data from the STARE Database[14] to perform automated retinal illness categorization in multi-categorical disease scenarios utilizing deep learning and state-of-the-art CNN for fundus picture enhancement. The simultaneous recognition of the centers of the fovea and the optic disc from color fundus images is regarded as a regression challenge that developed during the retinal image classification. This paper is organized as related works, methodology, results and discussion, conclusion. Retinal image classification related papers are discussed in Section II. Section III explains in detail about our proposed Penta-Convolutional Neural Network. Section IV discusses the results obtained using the STARE database[14]. Section V concludes the paper.
188
IoT, Cloud and Data Science
II. Literature Survey Retinal image preprocessing, enhancement, and registration by Carlos Hernandez-Matasa, Antonis A. Argyrosab , Xenophon Zabulisa - 2018[11] To achieve the goal of improving images and making future analysis easier, pretreatment and enhancement are necessary for a wide range of retinal image analysis methodologies in this study. Imaging methods such as optical coherence tomography (OCT)[11] and scanning laser ophthalmoscopy (SLO)[11] are used to investigate the details and applications of retinal image processing.This paper discusses fundus imaging, which includes fundus photography, SLO, and OCT imaging[11]. The most challenging task is registering axial OCT scans with frontal fundus images. It was easier to get a geometric representation of retinal tissue under the retinal surface because of this registration. Automatic Detection of Diabetic Eye Disease Through Deep Learning Using Fundus Images: A Survey by Sarki, Rubina, Ahmed, Khandakar, Wang, Hua and Zhang, Yanchun - August 2020[12] This research provides a thorough summary of diabetic eye disease detection methods, field approaches that are ground breaking, with the goal of providing useful information to researchers, healthcare professionals, and diabetic patients. Deep learning systems collect images with and without DED[12]. Image preparation techniques are then utilized to reduce image noise and prepare the pictures for feature extraction. The pre-processed pictures are sent into the DL architecture[12], which learns classification rules by extracting features and weights automatically. While deep neural networks have shown promise in medical imaging and the identification of retinal diseases, further refinement and development of more effective deep neural networks will be difficult. Classification of Diabetic Retinopathy Images by Using Deep Learning Models, Suvajit Dutta, Bonthala CS Manideep, Syed Muzamil Basha - 2018[6] This work focuses on picture preprocessing using various filter techniques to enhance the image's attributes. The method for extracting statistical features from images addressed in this study was to extract information from a scaled image of 2000x2000, because high resolution allows for greater exploration. The idea behind utilizing fuzzy C-means clustering (FCM)[6] is a technique for determining cluster levels in training data, which leads to improved training accuracy. Due to a lack of computational capability in some systems, images are scaled in some models, resulting in a feature loss factor and varying image labels. Multimodal Retinal Image Classification With Modality-Specific Attention Network, Xingxin He, IEEE JUNE 2021[13] This research offers a Modality-Specific Attention Network (MSAN) for multi-modal retinal image classification that efficiently classifies images from fundus and OCT images using modality-specific diagnostic criteria[13]. It extracts both local and global information from fundus images using a Multiscale Attention Module. For OCT pictures, a region-guided attention module is proposed to encode retinal layer-related features while ignoring the backdrop. A single retinal imaging modality can only reflect complicated ophthalmopathy to a limited extent, and it overlooks Modality-Specific (supplementary) data shared by many imaging modalities. Artificial Intelligence Based Branch Retinal Vein Occlusion Detection, Jecko Anto Kattampally, IEEE April 2020[16] The Artificial Intelligence model is developed with the aim of providing the first level of diagnosis of BRVO (Branch Retinal Vein Occlusion)[16]. Manually designing the feature extraction algorithms for each specific recognition task is done using four Convolutional Neural Network (CNN) models. Only one neural network is used per model. The prediction of the model directly determines whether the eye is occluded or not. Automated Classification of Retinal Diseases in STARE Database, Adeeb Shehzad, IEEE 6 JAN 2020[14] The Structured Analysis of the Retina (STARE)[14] project uses a publicly accessible library of retinal pictures to test automated retinal disease classification. In this work, a number of automated strategies for decision and classification problems use machine learning and reinforcement learning
Advances in Science and Technology Vol. 124
189
as fundamental models. Support Vector Machines (SVM)[14], Artificial Neural Networks (ANN)[14], K-Nearest Neighbor[14], and Naive Bayes[14] are among the algorithms employed. It is critical to detect a certain ailment. Many people in our country, however, are unaware of these illnesses. RETOUCH - The Retinal OCT Fluid Detection and Segmentation Benchmark and Challenge, Freerk Venhuizen, IEEE 05 JAN 2020[15] The objective of this study is to provide a reasonable baseline for evaluating algorithms for identifying and segmenting all three forms of fluid across retinal disorders and OCT providers[15]. Fluid detection and fluid segmentation are two tasks that the system must complete. This was the first time that included annotated photos from two clinical locations for all three forms of retinal fluid. In terms of size and variability, the challenge dataset far exceeds what has previously been accessible. III. Related Works The most extensively used techniques for preprocessing and augmentation of retinal images are Optical Coherence Tomography (OCT)[11], Scanning Laser Ophthalmoscopy (SLO)[11] and Modality Specific Attention Network (MSAN)[13]. As essential models, machine learning and reinforcement learning are used in these automated solutions for decision and classification problems. Image preparation techniques are then utilized to reduce image noise and prepare the images for element extraction. The images that have been pre-processed are sent into the DL architecture, which learns classification rules by extracting features and weights automatically. For AI based vein occlusion, four Convolutional Neural Network (CNN) models are used to manually build feature extraction techniques for each specific recognition challenge. The model's prediction directly determines whether or not the eye is occluded[16]. However, images are scaled in some models due to a lack of computing capabilities in some systems, resulting in a feature loss factor and changing image label. The most difficult task is registering axial OCT scans[16] with frontal fundus images. In terms of both size and variability, most existing systems' challenge datasets far exceed what has previously been offered. Deep neural networks can be utilized in the fields of medical imaging and eye disease diagnosis to overcome this. IV. Proposed Methodology The Penta-Convolutional Neural Network (P-CNN) is highly recommended in this research for retinal image classification. For further processing, the photos are pre-processed and augmented. Penta-CNN receives these preprocessed images. As described in subsection A, the pre-processed photos are enhanced further. In addition, the expanded set of photos is sent into the planned Penta-CNN. The outputs are acquired after the suggested model has been trained. The block diagram of our proposed Penta-CNN is shown below.
190
IoT, Cloud and Data Science
Fig 1: Proposed Penta-CNN Block Diagram A. Dataset For our paper, we have utilized four classes of retinal images from the STARE database. The four classes of the dataset include CNV (Choroidal Neovascularization), DME (Diabetic Macular Edema), DRUSEN (Diabetic Retinal Ultrasound), and Normal.Since, part of STARE database is used, our dataset is named as P-STARE database. The number of images in each class is shown in the sample table below. The sample images from our dataset are shown in the diagram below.The dataset was trained with 1600 photographs in total, with 332 test images separated into four classes. Table 1: Number of images in STARE Dataset CNV
DME
DRUSEN
NORMAL
TEST
101
84
67
80
TRAIN
400
400
400
400
Fig 2: Different classes of Retinal Diseases
Advances in Science and Technology Vol. 124
191
B. Data Augmentation Technique Oversampling of photos is used to compensate for the imbalance in the number of images in the dataset. Data Augmentation is the term for this method. Rescale, shear range, zoom, and horizontal flip are all data augmentation techniques used in our proposed Penta-CNN model. The values used by each procedure in the Data Augmentation used for our proposed study are tabulated in the table below. The images are supplemented by modifying current data to provide extra data for the model training phase. Shear Range is one of these techniques. The images are deformed along an axis in shear range, which is usually used to create or correct perception angles. Zoom Augmentation allows you to zoom in and out of the image. Zoom Augmentation allows you to zoom in and out of the image. We used a zoom range of 0.2 in the photographs. The final method is Horizontal Flip. Only if it is horizontally reversing the full rows and columns of an image pixel is its augmentation set to True. Table 2: Data Augmented Image Attributes Data Augmentation
rescale
Shear_ range
Zoom_ range
Horizontal_ flip
Values
1./255
0.2
0.2
True
C. Image Pre-processing Before the images are implemented for model training and inference, certain steps are taken initially to process them. This covers resizing, orienting, and color corrections, among other things. This object represents a dataset that could be utilized to run a deep learning model The image processing for the existing dataset containing retinal images is done manually using keras and tensorflow models. The typical image size was inputted as (400x400) during preprocessing to obtain better and clearer images. D. Penta-Convolutional Neural Network The Penta Convolutional Neural Network is a type of neural network that has five layers. Because of its applicability for image classification and fast performance, CNN Architecture has gotten a lot of attention. After the year 2012, many pre-trained CNNs emerged. There are few examples of selfcreated models for retinal image classification. As a result, we proposed Penta-CNN, a five-layered architecture. Following two fully-connected layers, there are two sets of convolutional and max pooling layers. The convolutional layer in this architecture uses a Rectified Linear Unit (RELU) activation function. In an unusually deep CNN, filters are applied to the initial image or other feature maps in the Conv2d layers. The Max pooling technique finds the most important component from the region of the element map covered by the filter. In the Fully Connected Layer, only a feed forward neural network is used, and the output dimensions from the preceding layers are transformed to 1-D format. The back-propagation of the Penta-CNN is comparable to that of other CNNs.
192
IoT, Cloud and Data Science
Fig 3: Detailed Architecture of Penta-CNN V. Results and Discussion Penta-CNN is trained to identify a few best hyperparameters such as optimizer, learning rate and number of epochs. Hence, this section contains subsections such as implementation setup, analysis of optimizers, analysis of learning rate, analysis of number of epochs. The proposed Penta-CNN model is implemented in Google Colaboratory. For implementation purposes, python 3 along with packages such as tensorflow, keras etc are used. The train-test split ratio of the dataset is 60:40. A. Analysis of Optimizers The Optimizers are a broader category of classifiers that can train your machine/deep learning model. To increase training speed and performance, the model requires the right optimizer. The system was put to the test with four distinct optimizers: Adam, SGD, Adamax, and RMSprop, using trial and error. We modified the optimizers while training the proposed model with 50 epochs and a batch size of 32. Adam optimizer achieved the highest accuracy of 82.81 percent with the lowest loss by adjusting the model with optimizers. The analysis performed on the Penta-CNN by altering the optimizers is shown in the table below. Table 3: Analysis of Optimizers Optimizers
Adam
Rmsprop
SGD
Adamax
Accuracy%
82.81
65.62
68.75
75.31
Recall
0.8125
0.625
0.6562
0.7975
Precision
0.8414
0.6452
0.875
0.7841
Adam outperforms the other optimizers, as indicated in the table above. As a result, for our system, we've customized the Adam (Adaptive Moment Estimation) optimization algorithm to be employed as an extension to stochastic gradient descent, which is commonly used for deep learning applications in computer vision and language processing. This optimizer was utilized for 300 epochs, with an average computing time of 32 seconds and a step time of 640 milliseconds, with a correctness of validation of 82.81 percent and a reduction of 0.4 percent.
Advances in Science and Technology Vol. 124
193
C. Analysis of Learning Rate The learning rate was gradually reduced from 0.1, 0.01, 0.001, etc. while the optimizer was set to Adam, the number of epochs was 300, and the batch size was 32, and the best accuracy with the lowest loss was found for 0.001. The learning rate obtained with Adam optimizer, batch size 32 for 50 epochs, is shown in the table below. The maximum accuracy for a learning rate of 0.001 was 77.5 %. Table 4: Analysis of learning rate Learning rate
Val accuracy
Val loss
Val precision
Val recall
0.01
0.7188
1.5194
0.7097
0.6875
0.001
0.775
0.6587
0.7864
0.7594
0.0001
0.75
0.8108
0.7778
0.6562
0.00001
0.5969
1.0506
0.8273
0.2844
In our approach, learning rates were significant because they allowed the model to learn quicker even if the final set of weights were sub-optimal. Although a sluggish learning rate may theoretically allow the model to obtain a more desirable or even globally optimal set of variables, training time will be substantially longer. D. Analysis of Number of Epochs The number of epochs was raised in increments of 50, starting at 50 and ending at 500, using the Adam optimizer with a learning rate of 0.001 and a batch size of 32. This was done to guarantee that the epochs were correct and that there was minimal data loss. The results of the research at 150 epochs gave the most accurate and effective findings, as shown in the table below. Table 5: Analysis of Number of Epochs Epochs
Accuracy%
Precision
Recall
50
81.25 %
0.8211
0.8031
100
84.06 %
0.8403
0.8219
150
88.13 %
0.8885
0.8719
200
85.31 %
0.8612
0.8531
250
83.75 %
0.8401
0.8375
300
84.69 %
038513
0.8406
350
86.87 %
0.8703
0.8594
400
84.06 %
0.8481
0.8375
450
84.06 %
0.8433
0.8406
500
83.75 %
0.8449
0.8344
194
IoT, Cloud and Data Science
E. Analysis of Performance metrics It is necessary to evaluate a deep learning model for the purpose of measuring its performance. A deep model is assessed using a number of different metrics. Accuracy, AUC (Area under Curve) and other classification performance measures are useful in determining the intended final outcome. These metrics such as precision and recall are primarily employed in our model to assess machine learning algorithms, which may be used to retrieve precise values and percentages. Table 6:Performance metrics of our proposed Penta-CNN Performance Metrics
Accuracy %
AUC
Precision
Recall
Final Result
84.69
0.9575
0.8571
0.8438
The proposed model's findings are tabulated, the accuracy and loss of the train and test models are presented as a line graph, respectively. The accuracy and loss functions are calculated across all data items during each epoch, and the quantitative loss measure is assured by sketching a curve through iterations that gives the loss on a subset of the full dataset at the given epoch. The accuracy and loss graphs are shown in the below images.
Fig 4: Penta-CNN Model Accuracy graph
Fig 5: Penta-CNN Model Loss Graph VI. Conclusion A Penta-Convolutional Neural Network (Penta-CNN) model with multiple layers was built such that the employed retinal pictures can identify multiple classes of retinal illnesses at the same time in this paper. For retinal image classification, the STARE dataset is obtained and pre-processed. The capacity to automatically classify images is a major advantage of the CNN classification system. As a result, for retinal image classification, a deep learning algorithm was adopted. These fundus pictures have disease-specific and complementary properties. We proposed a Penta-CNN model to recognise retinal images in order to show a higher efficiency. As a result, it's employed as a starting point for feature extraction and generation. When there aren't enough competent ophthalmologists in some
Advances in Science and Technology Vol. 124
195
distant hospitals, our categorization approach allows ophthalmologists to forecast possible systemic and ocular illnesses quickly and accurately using easily accessible fundus retinal photos, enhancing workflow and lowering clinic errors. However, our approaches have flaws, such as the fact that we did not consider all of the classes in the STARE Database. The majority of the algorithm's parameters were chosen by trial and error in order to achieve good results.The dataset will be trained using CNN as we progress, and hyper-parameters will be fine-tuned. The weights employed in the matching process, for example, can be improved to provide a higher average accuracy. As a result, we plan to detect a variety of additional diseases that cause retinal damage in the future, as well as highlight the disease's symptoms for better comprehension. Acknowledgment This project was guided by Dr. S. Anubha Pearline, Assistant Professor, SRM Institute Of Science and Technology. References [1] C. Li et al., Multi-view mammographic density classification by dilated and attention-guided residual learning, IEEE/ACM Trans. Comput. Biol. Bioinf., early access, Feb. 3, 2020, doi: 10.1109/TCBB.2020.2970713. [2]Mithun Kumar Kar, Malaya Kumar Nath & Madhusudhan Mishra, Retinal Vessel Segmentation and Disc Detection from Color Fundus Images Using Inception Module and Residual Connection,17 December 2021. [3] Z. Gao, L. Wang, L. Zhou, and J. Zhang, “Hep-2 cell image classification with deep convolutional neural networks,” IEEE journal of biomedical and health informatics, vol. 21, no. 2, pp. 416–428, 2017. [4] H. Fu, Y. Xu, S. Lin, D. W. K. Wong, and J. Liu, Deep Vessel: Retinal vessel segmentation via deep learning and conditional random field, in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2016, pp. 132–139. [5] Sowmya, R. A Survey on Automatic Detection of Retinal Disorders from Fundus Images. International Journal of Research in Computer Applications and Robotics, 4(1), 9–15. (2016). [6] Suvajit Dutta, Bonthala CS Manideep, Syed Muzamil Basha, Classification of Diabetic Retinopathy Images by Using Deep Learning Models, 2018. [7] Nasr Y. Gharaibeh. A Novel Approach for Detection of Microaneurysms in Diabetic Retinopathy Disease from Retinal Fundus Images, Computer and Information Science, Vol. 10, No. 1. (2015). [8] Wu, J., Waldstein, S. M., Montuoro, A., Gerendas, B. S., Langs, G., & Schmidt-Erfurth, U. Automated Fovea Detection in Spectral Domain Optical Coherence Tomography Scans of Exudative Macular Disease. International Journal of Biomedical Imaging, 1–9. (2016). [9] Priyanka B. Kale, Prof. Nitin Janwe. Detection and Classification of Diabetic Retinopathy in Color Fundus Image, International Journal For Research In Advanced Computer Science And Engineering, Vol. 3, Issue 7. (2017). [10] Y. Dong, Q. Zhang, Z. Qiao, and J.-J. Yang, Classification of cataract fundus image based on deep learning, in 2017 IEEE International Conference on Imaging Systems and Techniques (IST). IEEE, 2017 [11] Carlos Hernandez-Matasa, Antonis A. Argyrosab , Xenophon Zabulisa, Retinal image preprocessing, enhancement, and registration, 2018. [12]Sarki, Rubina, Ahmed, Khandakar, Wang, Hua and Zhang, Yanchun, Automatic Detection of Diabetic Eye Disease Through Deep Learning Using Fundus Images: A Survey, August 2020.
196
IoT, Cloud and Data Science
[13] Xingxin He, Multimodal Retinal Image Classification With Modality-Specific Attention Network, IEEE JUNE 2021. [14] Adeeb Shehzad, Automated Classification of Retinal Diseases in STARE Database, IEEE 6 JAN 2020 [15] Freerk Venhuizen, RETOUCH - The Retinal OCT Fluid Detection and Segmentation Benchmark and Challenge, IEEE 05 JAN 2020 [16] Jecko Anto Kattampally, Artificial Intelligence Based Branch Retinal Vein Occlusion Detection, IEEE April 2020 [17] Ortega M, Penedo MG, Rouco J, Barreira N, Carreira MJ: Retinal verification using a feature points-based biometric pattern. EURASIP J. Adv. Signal Process. 2009, 2009: 1-13. [18] Ortega M, Marino C, Penedo MG, Blanco M, Gonzalez F: Biometric authentication using digital retinal images. In Proceedings of the 5th WSEAS International Conference on Applied Computer Science. Hangzhou; 2006:422-427. [19] Moravec HP: Towards Automatic Visual Obstacle Avoidance. In Proceedings of the Int’l Joint Conf Artificial Intelligence. Cambridge, MA; 1977:584. [20] Sukumaran S, Punithavalli M: Retina recognition based on fractal dimension. IJCSNS Int J Comput Sci and Netw Secur. 2009, 9(10):66-7.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 197-202 doi:10.4028/p-2kzgh5 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-14 Accepted: 2022-09-16 Online: 2023-02-27
Weather Image Classification Using Convolution Neural Network Sanjay Nambiar1,a, Arjun P.2,b, Deepak Venkateswar R3,c, Rajavel M.4,d Department of Computer Science & Engineering, College of Engineering and Technology, SRM Institute of Science & Technology,Vadapalani, Chennai - 600026, India
1,2,3
Assistant Professor, Department of Computer Science & Engineering, College of Engineering and Technology, SRM Institute of Science & Technology, Vadapalani, Chennai - 600026, India
4
[email protected], [email protected], [email protected], [email protected]
a
Keywords - Deep learning, image recognition, Convolution neural network, Visual cortex
I. Abstract: A real-world weather prediction system that detects and describes weather condition in image data is becoming prominent subject in machine vision . These systems are designed to address the challenge of weather classification using machine vision. Advances in the fields of Artificial Intelligence and Machine Learning enables applications to take on the image recognition capabilities to identify the input image . Deep learning is a vast field and narrow focusing a bit and takes up the challenge of solving an Image Classification process. Proposed deep learning algorithms by tensorflow or keras by classifying the image of weather reports by convolution neural network. CNN is an artificial neural network which inspires animal neural cortex and built upon it . The images are passed through the neural network which consists of multiple layers and filters and then identified and classified according to the weather type. The algorithm is inspired by the brain and so named as Convolution neural network. II. Introduction: Deep learning is a part of machine learning that is entirely based on artificial neural networks. Neural networks act as a mimic of the human brain. Development of processing power and availability of data has made deep learning perform well. The world is studied to be represented as a classification of ideas, and each concept is defined in relation to simple concepts and intangible computer-generated presentations in terms of intangibles. This is a representation of 100 billion neurons. As a result, it creates an artificial neural net, which contains nodes or neurons like in an nerve cell. Deep learning models are made of neurons in the beginning to the end and is connected to each other neurons . III . Related Work: The authors of [1] presented a new algorithm known as "multipulse processing" (MPP). It is used in weather radar applications to improve mean Doppler velocity estimation. Its applied for both staggered and uniform pulse repetition time (PRT) sequences. In fact, MPP entails achieving a specific zero performance while also adjusting data at multiple lags. The first Doppler velocity rating is required to select the correct zero. In this case, a novel MPP algorithm is used to improve mean velocity estimation. In fact, MPP includes numerical operations configuration, which includes data automation measurements for multiple lags. [1] They presented the theoretical formulation, from which the algorithm steps were derived. It is worth noting that MPP functions as a secondary Doppler velocity estimation stage, requiring a seed that must be obtained using a primary method, which will be improved after applying MPP. The disadvantages of this system are that it does not use CNN as a classifier and is not focused on increasing the recognition rate and accuracy of weather severity. [5] Despite recent advances in many technological fields, such as an increase in weather forecast models and algorithms, accurate weather forecasting is becoming more difficult. Because it is based on capturing images in real time, the authors of [3] have mentioned that we will face similar challenges in detecting snow and haze.
198
IoT, Cloud and Data Science
IV . Proposed Work: Weather classification is done in a comparable manner, yielding accurate result. Technological advancements and deep learning can aid in more accurate weather prediction without supervision. The convolutional neural network is a deep-learning algorithm that has produced significant results in image segmentation and classification (CNN). The classification was carried out using an image database containing weather image. For each class of images classified as input images, various images were collected. The article describes a Deep Learning (DL) based weather prediction method to predict the result. The success of the results obtained will increase if the CNN method is added with some removal layers and by splitting the train and test data. For deployment, the prediction result is deployed in local-host web application. Objective: To create a deep learning model for weather classification using a convolutional neural network algorithm in order to potentially classify the results with the highest accuracy possible by comparing different types of CNN architectures. Scope: The scope of this project is to classify and recognize weather image. A collection of Weather images we have: To train the machine to classify the types of Weather. This project contains 4 different types weather. We train to teach the machine to achieve the accuracy and get the possible outcome.
Fig 1: System Architecture
Fig 2: Workflow Diagram
Advances in Science and Technology Vol. 124
199
V . Methodology: 1. Collecting Data When looking for a problem to solve, you should first research and gather data which will be fed to the machine. The quantity and quality of information you receive are critical because they will have a significant effect on how well or inadequately your model works. You may already have the data in a database or you will have to start from scratch. Dataset is self acquired and prepared using google images. 2. Preparing dataset Here the data is checked again if there is a correlation between the various features we have found. You have to choose the features because the ones you choose will have a significant impact on performance and results. If necessary, you can also use PCA to reduce the size. Divide the data into two groups: training and model testing, which should be separated approximately 80/20 but may vary depending on the type and amount of data we have.
Fig 3 : Sky images 3. Model Building For the purpose you may have, you will use division algorithms, prediction, linearization, integration. There are several models to choose from depending on the type of data you'll be processing, such as images, sound, text, and numerical values. As an image processing model, we employ the Convolutional Neural Network. 3.1 Convolutional layers: Convolutional layers are layers in deep CNN where filters are applied to the original image or other feature maps. Most of the user-defined network parameters are located here. Character value and character size are the most important parameters. 3.2 Pooling layers: These are similar to convolutional layers, but has a specific function . they are of two types : max pooling and average pooling Max pooling, that returns the highest value of each patch of feature map. The results are aggregated image features that highlight the feature that is most prevalent in the patch, rather than the feature's average presence in the case of average pooling. In practise, it has been demonstrated that this outperforms average pooling for machine vision image classification tasks. 3.3 Dense layers: In deep learning, a dense layer is a deep layer connected to its previous layer. Likewise a network is formed. These layers are widely used in neural networks. The dense layer neurons perform matrix-vector duplication. The row and the column vectors are of same numbers Fully connected layers are used to flatten the results. This is analogous to an MLP's output layer.
200
IoT, Cloud and Data Science
Fig 4: n*n matrix duplication formula 3.4 LENET: LeNet was one among the earliest convolutional neural networks which promoted the event of deep learning. After many years of compelling analysis and repetition, the result was named LeNet. LeNet has played a major role and paved way for yielding better results in our project. 3.4.1 Architecture of LeNet-5: LeNet-5 CNN architecture is made up of 7 layers. It consists of 3 convolutional layers, 2 small sample layers and 2 fully connected.
4 . Training your machine model In order to make data sets work efficiently and to see improved performance by guessing value, users will need to train them. Randomly launching your model's weights means that weights - affecting the input-output relationships which will be automatically adjusted to the model trained by the user. 5. Evaluation: To verify the accuracy of the already trained model, user will need to test a machine against the dataset containing the input that the model requires. An accuracy less than 50% is incompatible for the model to decide the output. The accuracy should be 90 percent or higher, the model is sure to delivers the expected output.
Model Accuracy analysis
Advances in Science and Technology Vol. 124
201
Model loss analysis VI . Deployment (Using Django): Django is a Python web framework leading to the rapid development of secure and secure websites. Designed by experienced engineers , Django takes great care of the complexities of web development. The trained model is switched to hierarchical data or weights format file (.h5 file) which is then deployed using Django framework .The user interface is made creative for better visualization. The output whether is predicted and tell the user whether the given image is CLOUDY / RAIN / SHINE / SUNRISE. Testing on real weather data: To take the testing onto real weather data, into the web based application created using Django framework, we have to upload a photo taken to analyze and classify the weather type. Result(Output screenshot):
202
IoT, Cloud and Data Science
VII . Conclusion: This article discusses the most recent and simpler method of predicting weather by using an image as input. For improved accuracy, the CNN model was trained with multiple images. It focused on how an image from the given dataset and an existing data set was used to anticipate weather patterns using a CNN model. We compared the accuracy of different versions of CNN and discovered that LeNet gave better results, and the.h5 file is deployed in the Django framework . References [1]
Juan P. Pascual, Jorge Cogo, “Multipulse Processing Algorithm for Improving Mean Velocity Estimation in Weather Radar”, 2021
[2]
Muhammad Qamar Raza , N. Mithulananthan, Jiaming Li, Kwang Y. Lee, Hoay Beng Gooi, ”An Ensemble Framework For Day-Ahead Forecast of PV Output Power in Smart Grids”, 2018
[3]
Zheng Zhang, Huadong Ma , “MULTI-CLASS WEATHER CLASSIFICATION ON SINGLE IMAGES” , 2015
[4]
Ruggero Donida Labati, Angelo Genovese, Vincenzo Piuri, “A Decision Support System for Wind Power Production”, 2018
[5]
Arti R. Naik, Prof. S.K.Pathan, “Weather Classification and Forecasting using Back Propagation Feed-forward Neural Network” , 2012
[6]
SANDEEP PATTNAIK, “Weather forecasting in India: Recent developments” ,2019
[7]
Vijay Kumar Didal, Brijbhooshan, Anita Todawat and Kamlesh Choudhary, “Weather Forecasting in India: A Review” , 2017
[8]
R. J. Doviak and D. S. Zrnic ́, Doppler Radar and Weather Observations, 2nd ed. San Diego, CA, USA: Academic, 1993.
[9]
Imran Maqsood Muhammad Riaz Khan, "An ensemble of neural networks for weather forecasting," Neural Computing & Applications, vol. 13, no. 2, pp. 112-122, 2004.
[10] Shah, Urmay, et al. "Rainfall Prediction: Accuracy Enhancement Using Machine Learning and Forecasting Techniques." 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC). IEEE, 2018.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 203-210 doi:10.4028/p-5h1ne6 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-30 Accepted: 2022-10-02 Online: 2023-02-27
Facial Landmark and Mask Detection System Nandila Bhattacharjee1,a*, Sakshi Dubey2,b and Shriya Siddhartha3,c Computer Science Engineering, SRM Institute of Science and Technology, Chennai, India
1,2,3
[email protected], [email protected], [email protected]
a
Keywords: Face mask detection, Facial landmark detection, masked, unmasked, incorrectly worn masked.
Abstract. COVID-19 commonly known as Coronavirus is caused by the virus SARS-CoV-2. This pandemic can be prevented by wearing a face mask after taking the vaccines that have arrived in the market, A vast trial led by researchers at Yale University and Stanford Medicine came up with the conclusion that if a person’s mouth and nose is covered by a face mask they would have a lesser chance of getting infected by COVID-19. To contribute towards global health, this project aims to develop an alert system for face mask detection in public places. Our system tries to achieve this using Keras, TensorFlow, MobileNet, OpenCV, PyTorch, and R-CNN. The proposed work also does Facial landmark detection which can be further extended in the future with face mask detection for applied in Biometric applications. This paper is aimed towards creating a real-time and highly precise technique which efficiently identifies faces that are masked, unmasked, and incorrectly masked and alerts in the case of anomalies to ensure that masks are worn properly. Introduction This pandemic has changed life in many ways. Most people prefer avoiding commuting to public spaces like colleges, schools and offices and prefer staying/working from the comfort and safety of their homes. This has become a national concern and masks have become the most effective and preventive method. This has led to many governments making masks mandatory in public places. Steffen et al mentioned in his paper the need for mask usage in the public spaces, a portion of which may have no symptoms of the infection in Washington and NY. The results showed wearing masks would prevent from 17% to 45% of estimated human deaths over the period of 2 months in New Work and decreased the death rate by 34% to 58%. Their findings highly recommend the use of face masks in crowded areas to control the coronavirus from spreading. In addition, as the country resumes from the blockade of COVID-19, Government officials recommend wearing masks on our faces as an essential measure to protect us when we go out into public. In order to mandate the use of the face mask, it is essential to develop a system or technique allowing people to wear the mask once they step out of their house. Detecting Face mask is the process of detecting whether a person is wearing a mask. The problem is the reverse engineering of detecting faces, which recognizes faces using various machine learning algorithms for various security, and monitoring purposes. Face detection is a vital field of pattern recognition as well as computer vision. A major study of face detecting was conducted in 2001 using the ideas of craftsmanship & ML (machine learning) algorithms to effectively train classifiers to be used in detection. A problem encountered with this approach involves the very complex design of features and poor recognition accuracy. In recent years, face detection technology based on the Deep Convolutional Neural Network (CNN) has been widely developed to improve recognition performance. To improve the accuracy of face detection both approaches were tried, ie. Feature-Based and imagebased using Viola-Jones algorithm and MTCNN Multi-task Cascaded Convolutional Networks respectively. Although several scholars and researchers have devoted time in creating algorithms to be used in detection and recognition of faces, there is an important distinction between detection of the face below or over the mask. In this proposed work we wish to develop a highly accurate three-way face
204
IoT, Cloud and Data Science
mask detector that can recognize mask, unmasked and mask worn incorrectly. We also wish to create a facial landmark detection system that can in the future be integrated with face mask detection for biometric purposes. Literature References Initiating a new idea requires finding existing solutions and drawbacks if they exist and needing to overcome the failures in them. So a survey of the literature is the most important step in the process of development. Shilpa Sethi in her proposed work relied on the item recognition benchmark. In keeping with the benchmark, all the tasks associated with a seeing problem were categorized into the components: Head, Neck and Backbone. The backbone corresponds to a CNN that can use images to extract information and then convert it to a feature map. The backbone makes use of pre-learned attributes of a trained CNN to extract new features.The neck contains things required for images to be classified. Head corresponded to the detection system. In her work, deep neural networks were used to detect face masks as they were more resourceful and transfer learning was applied to it. ResNet50 was used for fine-tuning. The YOLO network was used to divide images to a GxG sized grid which generates predictions for the bounding box.. In their system, a deep learning method for identifying masked faces in public areas is achieved. This technique was resourceful in handling anomalies in situations. In the work proposed by Mingjie Jiang, a Retina FaceMask was built, which is a face mask detector. It’s a single stage detector, consisting of a feature pyramid for the fusion of semantic information along with various feature maps and module for facemask detection. After carefully analyzing existing works, the various limitations that were identified were: 1. No realtime application for the face mask detection system has been included in the existing works 2. The existing works are limited to two-way classification of mask and don’t extend to incorrectly worn mask 3. This existing system just focuses on face mask detection and no alert system or landmarks detection system is included. Architecture Diagram Testing Data
Training Data
Data Conversion
Data Conversion
Training
Data Classification
Result
Fig. 1 The dataset is split into Training Data and Testing Data, the data conversion takes place for both sets of data, after which the training data is used for model creation, the training data outputs are then trained with their classifications that are the three labels: masked, unmasked, mask worn incorrectly. The model is then evaluated. This model is then used on the Testing Data and classification of the data takes place and the result is projected
Advances in Science and Technology Vol. 124
205
Materials and Methods Algorithm Used Viola-Jones: It’s an algorithm that detects objects and was proposed by Paul Viola and Michael Jones. It is trained to recognize different object classes, but mainly due to face recognition issues. Viola-Jones is designed for frontal faces, so it can best detect frontal faces rather than sideways or upside-down faces. MobileNetV2: It's a CNN architecture aimed at improving performance on phones. It relies on a reverse structure wherein the bottleneck layers contain residual connections. MobilenetV2 Image Classifier is used to classify images for more accurate performance. MTCN: The Multitasking Cascade Convolutional Network (MTCNN) is a framework made as a solution for both face recognition & alignment. It consists of a three-stage convolutional network that can recognize faces and landmarks such as mout , nose etc. InceptionV3: It's a convolutional neural network-based deep learning model used for image classification Technical Modules Module 1: Data preprocessing
Module 2: Face detection
Module 3: Landmark Detection
Module 5: Detecting face mask in real-time
Module 4: Training mask detector model
Fig. 2 MODULE 1 Data preprocessing All images in our dataset have their bounding boxes in the PASCAL VOC format and their information is stored in XML format in the annotations directory. MODULE 2 Face detection The aim of this particular module is to determine if there are any faces in the real time feed. If more than one face is identified, each face is enclosed in a bounding box and hence we know the where the faces are This is done using OpenCV as it helps in doing real-time face detection via webcam in a live manner. Usage of the Haar cascades(also known as Viola-Jones) algorithm is done as it is very useful for frontal face detection. The algorithm was proposed by Paul Viola and Michael Jones. Although it is an older framework it is viable in real-time face detection and works efficiently. It can work on grayscale images. It uses smaller subregions and finds faces by searching for precise features in every subregion. An image can consist of faces of various sizes so it checks different orientations and scales of the image. MODULE 3 Landmark Detection In our work it detects faces and facial landmarks via webcam. Landmark detection uses keypoints on a person’s face. We are going to use The Viola-Jones algorithm (also known as Haar cascades) for this.The output detects facial landmarks of a person in real time with or without a mask. The landmarks detected are eyes, eyebrows, lips, nose and the outline of the face. MODULE 4 Training mask detector model To train the inputs transfer learning methodology is used. The classifier model is to be built with InceptionV3 neural network architecture.The model was trained and was later loaded in the live
206
IoT, Cloud and Data Science
detection and used for classifying the real time webcam image into the three categories. We are expecting an accuracy of more than 95%. MODULE 5 Detecting face mask in real-time MTCNN for face detection in real time scenarios using web cameras. It's going to have three categories: masked, unmasked and wrongly masked which can be identified with different color boxes around the face .In order for our mask detector model to work, it needs images of faces. For this, we will detect the frames with faces using opencv. Proposed Work Using Viola-Jones a face detector was developed which was capable of identifying faces.Facial landmarks were identified using Haar Cascade and Opencv two models were trained and were tested on webcam. Developing a model using R-CNN (PyTorch) on a vast dataset for distinguishing masked , unmasked and incorrectly masked individuals along with displaying accuracy of mask when worn correctly. Detecting facemask in a real time scenario using Tensorflow,OpenCv and Keras with an alerting system with accuracy percentage of the prediction. A model which can classify images into wearing a mask, not wearing a mask and not properly wearing a mask was trained. The dataset consisted of two folders, one having images and other having annotations. Each of these images contains multiple persons which are either wearing a mask or not wearing it or not wearing it properly.
Fig.3 The annotations file of each file contains information about where the face it and under which category does it fall.The annotations file was analyzed and the information was parsed using beautifulsoup.Later a list was made which contained a dictionary having information about coordinates of faces and the labels of each face.Opencv was used to read the image , the face ares were extracted from it, preprocessing was done and it was saved in a list.The labels were saved simultaneously in another list.MobilenetV2 model was finetuned and results were obtained. The loss and accuracy graph on the training set was plotted for better visualization.
Advances in Science and Technology Vol. 124
207
Fig. 4 The pictures have the bounding boxes covering the face in the PASCAL VOC format and their information is stored in the annotations directory saved in XML format. The region which is inside the bounding box is taken as input for the neural network model and their respective labels are taken as output. Transfer learning was used for training the inputs. The classifier model was formulated with the help of InceptionV3 neural network architecture.The data was again prepared for training purposes. It was trained for 20 epochs, after which the loss and accuracy graph on the training set was plotted for better visualization. Accuracy on the test set was performed and the accuracy came out to 96%. Confusion matrix was also created.
Fig. 5
208
IoT, Cloud and Data Science
Fig. 6 Since the accuracy of the second model was high, this was taken further for testing in real time using a webcam. This was done with the help of Multi-task Cascade Convolutional Neural Network(MTCNN). The classifier model was loaded, each label was assigned a color code and an alarm system was installed for unmasked and incorrectly worn masks. On turning on of the webcam the system was able to detect face .In the presence of masks it gave a green boundary box with no alarm sound, as soon as the system detected a face without mask,or mask not worn correctly boundary boxes were in red, yellow color respectively with an alarm ringing until mask on the face was detected again. Output
Fig. 7
Advances in Science and Technology Vol. 124
Fig .8
Fig. 9
Fig.10
209
210
IoT, Cloud and Data Science
Results The facial landmarks were detected on faces with and without masks. Detecting real-time masked, unmasked and incorrectly masked individuals along with accuracy for masked labels and an alert system for anomalies. Summary Using deep learning methodology a face mask detection system was made.The proposed technique was highly accurate which gave a novel three way classifier including the very necessary wrongly worn mask.To achieve this accuracy a transfer learning was used in our model with a large robust unbiased dataset. This project can be further extended by integrating the system to CCTV in public outlets such as offices, schools etc.This can also be integrated into an application for monitoring individuals and raising an alert in the circumstances of detecting an unmasked or incorrectly masked person and holding them accountable for violation of safety rules. The proposed work also performs landmark detection with or without a mask on which can be integrated for biometric purposes. References [1] M. Jiang, X. Fan, and H. Yan, RetinaMask: A Face Mask detector, 2020, http://arxiv.org/abs/2005.03950 [2]
Face mask detection using deep learning: An approach to reduce risk of Coronavirus spread: Journal of Biomedical Informatics: Vol 120, No C (acm.org)
[3]
World Health Organization et al. Coronavirus disease 2019 (covid-19): situation report, 96. 2020. - Google Search. (n.d.). https://www.who.int/docs/defaultsource/coronaviruse/situationreports/20200816-covid-19-sitrep-209.pdf?sfvrsn=5dde1ca2_2.
[4]
Garcia Godoy L.R., et al., Facial protection for healthcare workers during pandemics: a scoping review, BMJ, Glob. Heal. 5 ( 5) (2020), 10.1136/bmjgh-2020-002553
[5]
Inamdar M., Mehendale N., Real-Time Face Mask Identification Using Facemasknet Deep Learning Network, SSRN Electron. J. (2020), 10.2139/ssrn.3663305
[6]
S. Qiao, C. Liu, W. Shen, A. Yuille, Few-Shot Image Recognition by Predicting Parameters from Activations, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, https://doi.org/ 10.1109/CVPR.2018.00755.
[7]
R. Girshick, J. Donahue, T. Darrell, J. Malik, Region-based Convolutional Networks for Accurate Object Detection and Segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 38 (1) (2015) 142–158, https://doi.org/10.1109/TPAMI.2015.2437384.
[8]
Z. Cai, Q. Fan, R.S. Feris, N. Vasconcelos, A unified multi-scale deep convolutional neural network for fast object detection, Lect. Notes Comput. Sci. (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016), A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection.
[9]
I.D. Apostolopoulos, T.A. Mpesiana, Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks, Phys. Eng. Sci. Med. (2020), Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks
[10] B. Roy, S. Nandy, D. Ghosh, D. Dutta, P. Biswas, T. Das, MOXA: A Deep Learning Based Unmanned Approach For Real-Time Monitoring of People Wearing Medical Masks, Trans. Indian Natl. Acad. Eng. (2020), https://doi.org/10.1007/s41403- 020- 00157-z
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 211-218 doi:10.4028/p-y3z104 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-12 Accepted: 2022-09-16 Online: 2023-02-27
Candidate Prioritization Using Automated Behavioural Interview with Deep Learning Hari Vilas Panjwani1,a*, Asnath Victy Phamila Y2,b School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
1-2
[email protected], [email protected]
a
Keywords: screening, automation, hiring, ranking
Abstract. The objective of a project is to make an integrated technological solution to assist and help companies in their hiring process. We have come up with the concept for prioritization of candidates using automated behavioural interview. This is a web-based portal that will be supporting the recruitment operations by prioritizing candidates from a large pool of candidates based on skills and experiences relevant to the job. Facilitating an informed interview process for a candidate with systematic transparent procedure. Our project aims to apply our technical skills to facilitate interviewers and address the challenges faced by the organization in its hiring process. This has been addressed using recording the answers of random behavioural questions with limited time to prepare, then video will be analysed to find the behavioural aspect of the candidate. Behavioural analysis of the interview is done by applying facial emotion recognition based on VGG Net Architecture. Introduction The success of an organization comes with the quality of people in the organization. People come and go, but their work remains with the organization. To ensure the right people, it becomes essential that the hiring process be robust. To make sure fair and comprehensive hiring costs shoot up. Different companies have their methodology of hiring. It keeps evolving as per the needs of the company and time. In general, interviewing is a significant part of the process. Typically, 3-4 interview rounds of around 45-60 min are conducted before making decision to hire a candidate. Every round is associated with high costs of the employees’ generally senior software developers. Even managerial round is also part of the process taken by the engineering manager. Moreover, scheduling and finding suitable time for such senior employees and putting their main work on hold for conducting interviews is really hectic process. The number of candidates applying for a job is too many compared to hiring needs. And it is practically impossible to interview all of them. Therefore, there is need for a platform which could address these issues and altering hiring process so that only suitable candidates make it to the interview to minimize the cost involved in interviewing. In today's digital era, conducting hiring interviews virtually has become the trend of nearly every organization, especially post pandemic. It would continue to be so for various reasons and advantages to most companies, like no cost in the transportation of the interviewees to multiple campuses across the country, hiring across the geographies, employees preferring work from home. Motivation of the project comes by considering problems associated with hiring and leveraging the virtual interviews helps the scope of automating and smoothening some parts of processes involved making it convenient and cost effective at same time. Literature Survey Over the recent years, several researchers have been trying to solve the problem of hiring efficiently. Companies providing hiring as a service have been contributing actively in this area. There has also been a surge in the number of experiments companies have been doing to practice the solutions and have a suitable hiring process. For the initial screening of candidates, various approaches have been researched. Faliagka[19] came up with pre-screening the candidates to reduce the interviewing and background checks for limited selected candidates only using linguistic analysis of candidate’s personality traits. Gajanayake[22]
212
IoT, Cloud and Data Science
paper uses a transcript, Github profile, LinkedIn profile, CV, Academic transcript, and recommendation letters for the initial screening of candidates. This approach uses NLP for personality traits, candidates' skills, and background from the Academic transcript further gives Github insights and LinkedIn profiles, and uses machine learning for finding the best possible candidates. Palshikar[11] research focused on filtering candidates based on the characteristics such as skills, years of experience, etc., by finding the similarity in characteristics with candidates previously hired. This has been done using a neural network with appropriate weights and scoring functions to screen candidates on the scale. Wilson[10] presented the case study of how pymetrics software being used for hiring, answers to assessment has been used as inputs to machine learning model to predict hiring match. Backtesting has been done to check the presence of any bias. Similarly, the Amazon standard recruitment process involves a logical debugging round, coding round, and behavioral situation-based questions with 'agree', 'strongly agree', 'disagree', and 'mostly disagree' as answers testing the candidate's leadership qualities, only after clearing these further interviews are scheduled. Similarly, several researchers have been researching and finding hiring interview scores for candidates directly. Muralidhar[12] worked on finding hiring scores using verbal and non-verbal content of the candidate's video resume. Their study improved over others who were only considering the non-verbal behaviour of the video resume. Eddine[20] came up with the solution for estimating personality traits using a face video and giving interview scores with regression. Chen[18] predicted personality traits from the text of spoken words and found that Prosody and facial expressions were not useful for analysing but insisted on the need for further investigation. Gorbova[21] utilised the video, audio, and lexical features to estimate personality and score interview performance. Moreover, Naim[23] proposed a system that automatically analyses facial features like smiles, head gestures, tracking points, language features like word counts, topic modelling prosodic features like pitch, pauses, and intonation and recommended using more unique words and fewer filler words. Other researchers came up with the approach of conducting interviews asynchronously rather than face-to-face. Suen[16] proposed an asynchronous video interview platform using CNN to predict the communication skills and personality traits required for the job. Their study was able to predict openness, agreeableness, and neuroticism in line with the human raters. Still, conscientiousness and extraversion ratings by the system were not as perceived by human raters. Dataset For behavioural analysis, facial emotion recognition is a crucial part of it. In our system, we have used the facial emotion recognition (FER-2013) dataset which has total 35887 grayscale images of 48 X 48 pixels size, distributed across 7 classes’ namely happy, surprise, neutral, sad, fear, angry and disgust.
Angry Fear Happy Neutral Surprise Disgust Sad Fig. 1. Sample images of dataset
Advances in Science and Technology Vol. 124
213
Table I. Number of Images In Each Class Class Name
Count
Happy
7215
Neutral
4965
Sad
4830
Fear
4097
Angry
3995
Surprise
3171
Disgust
436
Fig. 2. Distribution of dataset images System Design Despite the importance of non-verbal cues in communication, it is not popular in the industry. Only a paper resume is submitted by candidates at the time of applying. Candidates randomly spamming their resume at every job posting many times without even reading the job requirements make shortlisting difficult. Adding one step of answering behavioural questions will increase the entry barrier where only candidates who are genuinely interested will proceed and judge candidates' communication skills, non-verbal cues, and matching with candidates.
Fig. 3. Flow Diagram The candidate applies to a job in the proposed system by reading the job description. Candidates are supposed to record answers to a series of questions, recorded for further analysis. Candidates will be prioritised based on their experiences and skills and how relevant they are as per the job requirement,
214
IoT, Cloud and Data Science
and their answers to the questions they recorded. This prioritisation saves a lot of time when the pool of candidates is very large, which is generally very common nowadays. A human interviewer will take the interview as usual and test the interviewee on the technical aspect. Our system will assist with the additional data points after the interview on behavioural of the interviewer, which aid in the final decision. Video and output will be stored for future reference in the system. Implementation of System Our methodology differs from the previously existing systems. Our system not only helps in screening the candidates but also assists interviewers with the interview by judging the candidates on other parameters while the interviewee is judging on technical aspects. Existing works of screening with video resume is not that effective since every candidate will have sufficient time for preparation in which they can manipulate the answers to their best to get selected. Rather than our approach giving random behavioural questions, candidates have very limited time to prepare and manipulate answers. This rapid-fire kind of approach helps get an original answer from the candidate from the top of the head. Along with this, standard resume scoring collectively makes an effective screening process. During the interview, the interviewee will be asking technical questions and judging the correctness of the answers. At the same time, the video will be analysed to find the behavioural aspect of the candidate. Behavioural analysis of the interview is by applying facial emotion recognition. A. Facial Emotion Recognition Over the recent years, several researchers have been trying to solve the problem of hiring efficiently. Companies providing hiring as a service have been contributing actively in this area. There has also been a surge in the number of experiments companies have been doing to practice the solutions and have a suitable hiring process. For use case of candidate screening we need to consider only five classes’ i.e. Angry, Happy, Neutral, Sad and Surprise. After processing we will get the frame count of all each class and total number of frames. Score can be calculated using following formula Score = (1.5*Happy + 1.25*Surprise + Neutral - 1.25*Sad - 1.5*Angry) / (1.5*Total Number of Frames) * 100 (1) Here dividing everything by 1.5 and total number of frames normalizes the score which then is converted into percentage. Facial Emotion Recognition is composed of steps which includes facial classification followed by emotion recognition which is explained in detail as follows:1) Facial Classification: Facial classification is used to determine the presence of face in an image. The face is classified based on facial features. Our System will be processing video frame by frame, and speed is one of our key performance indicators. In this project, we have used Viola-Jones Algorithm because it has a very high speed with reasonable accuracy, which meets the needs of our System. The algorithm uses edge or line detection features proposed by Viola and Jones in their research paper [4]. The model was trained on many positive and negative face images and then saved as an XML file. 2) Emotion Recognition: Emotion Recognition is broadly done in two ways. Firstly conventional approach of feature extraction with pattern classifiers. Secondly, the CNN-based system has shown fairly good progress in metrics. Based on processing with respect to frame or video, emotion recognition can be classified as static FER and dynamic FER. Although temporal FER provides addition temporal information, it has drawbacks like dynamic features have different transition duration and different feature characteristics for different faces. Emotions can be basic and compound emotions. There are seven basic emotions defined such as surprise, happiness, anger, fear, disgust, sadness and neutral. There are twelve compound emotions defined which are combinations of basic emotions. FACS encode the movements of specific facial muscles called action units (AUs), which reflect distinct momentary changes in facial appearance
Advances in Science and Technology Vol. 124
215
Facial landmarks (FLs) detection approaches can be categorized into three types according to the generation of models such as active shape-based model (ASM) and appearance-based model (AAM), a regression-based model with a combination of local and global models, and CNN-based methods. In our work we have researched and applied CNN based architecture for better results. By reducing layers from VGG[9] Architecture and training on 5 classes got the simpler version of VGGNet. Similarly, obtained Mini Xception from Xception[13] Architecture, Mini DCNN from DCNN[15] Architecture and Mini Resnet from ResNet[14] Architecture. Result and Discussion Facial Classification with the implementation of Voila Jones Algorithm given the input image returns the coordinates of the faces present in the image. Emotion Classification is tried after training of models using different architectures which gave different accuracies as follows:Table 2. Comparison of Accuracies of Models S. No.
Model
Accuracy
1.
Little VGGNet Architecture
54%
2.
Mini Xception Architecture
50.4%
3.
DCNN Architecture
34%
4.
Mini Resnet Architecture
29.8%
Fig. 4. Recording of Candidate attached with the profile
216
IoT, Cloud and Data Science
Fig. 5. Sample output of frame
Fig. 6. Processing frame by frame Score of each answer can be utilized for ranking of candidates and this prioritising. The work is also analyzed on various aspects such as performance, technological, economic, political, etc., and has been discussed in upcoming paragraphs. 1) Performance analysis: Fully responsive and scalable applications are made, which could be scaled further as users increase; analytics will also be used to improve performance further. The backend is highly scalable due to the MVC architecture pattern used in building the application. 2) Technical analysis: The technology used in this project is all open-sourced. The backend technology is Flask, React, Mongo DB, JavaScript and Python, which are highly trusted by developers worldwide and helpful in developing dynamic web applications. The application is secure and scalable. The front end is responsive. This enables the application to be accessed from any device – be it mobile or laptop. The website is optimized to render content, making it accessible even in times of poor network connectivity. 3) Economic analysis: The only minimal one-time cost is involved, which will further save hiring time and cost to the company for hiring the desired candidate.
Advances in Science and Technology Vol. 124
217
4) Social analysis: The candidate hired after the following screening will have better social and behavioural skills. This will help improvise social interactions and create an overall conducive, constructive and productive environment. 5) Environmental analysis: Simplifying the hiring process will not only help reduce costs but also will save unnecessary commute for hiring. Hence it's designed to minimize the carbon footprint on the environment. 6) Political and Demographic feasibility: The language translation facility of this application makes it usable and accessible to any population in the entire world. They can view the content on the web page in the preferred language of their choice. The application has its sole interest in simplifying the hiring process. Conclusion and Future Work As discussed by proposed work we are able to meet the defined objectives of simplifying and smoothening the cumbersome processes. With the help of platform candidates only most suitable candidates will get interviewed which not only saves time and cost for companies but also will be helpful for candidates to get timely feedback. This makes the decision making process of hiring transparent, efficient and data driven. In Future, at low level, complex deep learning algorithms can be applied with higher level accuracy in emotion recognition and better semantic analysis techniques could be used for highly specialized domain knowledge for resume similarity purpose. Communication module can be refined even more to find the confidence and overall personality of the candidate during the interview. From a high level perspective, future architecture of the system could be redefined to micro service architecture to help in vertical scaling for handling a large user base. Option for customization of hiring process can be provided to suit the company requirements from different domains. Subdomains can be provided to companies which helps in personalized experience to both company and interviewee. Platform can be modified in a way such that it can also be used by candidates themselves to practice the mock interviews with self or friends which can help them become much more confident for real interviews and also to identify the areas of improvement. References [1]
Bradski, Gary. "The openCV library." Dr. Dobb's Journal: Software Tools for the Professional Programmer 25.11 (2000): 120-123.
[2]
Barrick, Murray R., Gregory K. Patton, and Shanna N. Haugland. "Accuracy of interviewer judgments of job applicant personality traits." Personnel Psychology 53.4 (2000): 925-951.
[3]
Huffcutt, Allen I., et al. "Identification and meta-analytic assessment of psychological constructs measured in employment interviews." Journal of Applied Psychology 86.5 (2001): 897.
[4]
Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features." Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001. Vol. 1. Ieee, 2001.
[5]
DeGroot, Timothy, and Janaki Gooty. "Can nonverbal cues be used to make meaningful personality attributions in employment interviews?." Journal of Business and Psychology 24.2 (2009): 179-192.
[6]
Oostrom, Janneke K., et al. "A multimedia situational test with a constructed-response format." Journal of Personnel Psychology (2011).
[7]
Friedman, Carol, Thomas C. Rindflesch, and Milton Corn. "Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine." Journal of biomedical informatics 46.5 (2013): 765-773.
218
IoT, Cloud and Data Science
[8]
Nikolaou, Ioannis, and Janneke K. Oostrom. "Employee recruitment, selection, and assessment." Contemporary Issues for Theory and Practice. Hove, East Sussex: Routledge (2015).
[9]
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
[10] Wilson, Christo, et al. "Building and auditing fair algorithms: A case study in candidate screening." Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 2021. [11] Palshikar, Girish Keshav, et al. "Automatic shortlisting of candidates in recruitment." ProfS/KG4IR/Data: Search@ SIGIR. 2018. [12] Muralidhar, Skanda, Laurent Nguyen, and Daniel Gatica-Perez. "Words Worth: Verbal Content and Hirability Impressions in YouTube Video Resumes." Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2018. [13] Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [14] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [15] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012). [16] Suen, Hung-Yue, Kuo-En Hung, and Chien-Liang Lin. "Intelligent video interview agent used to predict communication skill and perceived personality traits." Human-centric Computing and Information Sciences 10.1 (2020): 1-12. [17] Naim, Iftekhar, et al. "Automated analysis and prediction of job interview performance." IEEE Transactions on Affective Computing 9.2 (2016): 191-204. [18] Chen, Lei, et al. "Automated video interview judgment on a large-sized corpus collected online." 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2017. [19] Faliagka, Evanthia, Athanasios Tsakalidis, and Giannis Tzimas. "An integrated e‐recruitment system for automated personality mining and applicant ranking." Internet research (2012). [20] Eddine Bekhouche, Salah, et al. "Personality traits and job candidate screening via analyzing facial videos." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017. [21] Gorbova, Jelena, et al. "Automated screening of job candidate based on multimodal video processing." Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017. [22] Gajanayake, R. G. U. S., et al. "Candidate Selection for the Interview using GitHub Profile and User Analysis for the Position of Software Engineer." 2020 2nd International Conference on Advancements in Computing (ICAC). Vol. 1. IEEE, 2020. [23] Naim, Iftekhar, et al. "Automated analysis and prediction of job interview performance." IEEE Transactions on Affective Computing 9.2 (2016): 191-204.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 219-227 doi:10.4028/p-7z8z0v © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-12 Accepted: 2022-09-16 Online: 2023-02-27
Book Recommendation Based on Emotion in the Online Library Using CNN Srinath R1,a*, Sai Siddharth Ravindran Gunnapudi2,b and Laxmi Narasimha Kokkalla3,c Department of Computer Science Engineering, SRM Institute of Science and Technology, Vadapalani Campus, Chennai, India
1,2,3
[email protected], [email protected], [email protected]
a
Keywords: Expression, Recommendation, Books, Facial, User, CNN, Model, Database.
Abstract. The present pandemic situation has made people not go out anywhere because it’s getting difficult to learn the concepts briefly for both lecturers and students. Concepts are learned through videos, but reading is also an important aspect of learning. This paper talks about providing books and notes in online mode to read and recommend books based on facial expressions captured from the user. This paper aims to extract faces from an image, extract the expression (eyes and lips) from it and also classify them into six types of emotions, which are Happy, Fear, Anger, Surprise, Neutral and Sad. The algorithm used for facial expression recognition is the Convolutional Neural Network algorithm, also known as CNN. Introduction A facial expression based book recommender for Digital library using machine learning is a Online site which recommends books by taking the user's mood as input. Other than book recommendations based on facial expression, users can also order and read books online. Latest news on the educational field can be seen on this website. It will recommend the most suitable book as recommended by the lecturers. Each lecturer can give their preferences on recommending a book and the book with the largest number of recommendations will be shown as most recommended. Facial expressions are used by humans to convey a range of messages in a number of contexts. The variety of interpretations is extensive, ranging from inherent socio-emotional conceptions like “surprise” to sophisticated, culture-specific terms like “carelessly”. Face expressions are used in a wide range of scenarios, including responses to external events and linguistic structures in sign languages. Muscles that attach to the skin and fascia in the face move, resulting in facial expressions. These muscles move the skin, causing lines and folds as well as facial movements such as the mouth and eyebrows.
Fig.1. Emotions expressed by humans
220
IoT, Cloud and Data Science
Table 1. Facial expression Identification. EXPRESSION
DEFINITION
MOTION CUES
Happiness is defined by emotions such as joy, pleasure, satisfaction, and fulfillment.
Raising and lowering of mouth corners.
Fear is an instinctive human emotion that is both normal and potent. It comprises a powerful individual emotional reaction as well as a universal physiological response.
Brows raised eyes open mouth opens slightly.
Anger is a feeling of hostility toward someone or something you believe has wronged you on purpose.
Brows lowered lips pressed firmly, eyes bulging.
Surprise
Surprise arises after we encounter sudden and unexpected sounds or movements.
Eyes widen wide to display more white mouth lowers slightly.
Neutral
We describe neutral affect as a lack of liking in one direction or the other, as well as a sensation of indifference.
A blank look conveys a lack of feeling
Sadness is a type of emotional distress that is linked to or defined by feelings of loss, helplessness, despair, sadness, disappointment, and sorrow.
Lowering of mouth corners raise inner portion of brows
Happy
Fear
Anger
Sad
Advances in Science and Technology Vol. 124
221
Table 2. Summary of Literature Survey S.NO 1
AUTHOR Yingruo Fan`, Victor O.K. Li`, Jacqueline C.K. Lam,` 2020`[1]`
TITLE OF THE PAPER
METHODOLOGY
Recognition of facial 1) Deeply- Supervised expression using` CNN Deeply-Supervised Attention Network`` 2) Attention Block 3) The Two-Stage Training Strategy
2
K. Puritat`and K. Intawong`, 2020`[8]
`Development of an Open Source Automated Library System for Small Libraries with a Book Recommendation System`
1) Damerau– Levenshtein distance based title similarity 2) Dewey Decimal Classification for book classification system is used` 3) The information on book similarity is based on a range of bibliographic data.`
3
S. Kim, G. H`. Facial Expression` 1) Face detection An` and S. -J`, Recognition System 2017`[10]` Using Machine 2) Facial ROI Rearrangement Learning` 3) HOG feature extraction 4) Facial Expression Recognition
4
Chandra Bhushan Singh`, Babu Sarkar`, Pushpendra` Yadav,``2021`[4]
INFERENCES The DSAN-VGG Architecture has the highest complexity-37.91, Accuracy(RAF)85.37 and Accuracy(AFEW)52.74. Recommendation module using multiple sources, predicted the minimum cost criteria for the title similarity of books with no learning data lending record, and the weight balance of bibliographic information, and demonstrated its effectiveness. The F1 score of the proposed system was 0.8759, which was 0.0316 greater than the conventional system's F1 score.
Facial Expression 1) Acquiring the Image Few frames (3 Recognition frames) are 2) Pre-processing necessary to detect facial expression. 3) Segmentation 4) Extraction of Function Categorization
222
5
IoT, Cloud and Data Science
K` Liu, C. Hsu, W. Wang` and H`. Chiang, 2019`[9]
Real`Time 1) Face detection Facial`Expression Recognition using 2) The CNN architecture CNN model.` 3) Average Weighting Method
When compared to results produced using only the convolution neural network, the overall robustness and accuracy of facial emotion detection have improved after using the average weighting method.
Facial Expression Recognition System in the Web Dataset FER2013 is a dataset that was utilized in this research. This collection contains around 30,000 face RGB (Red Green Blue) photos of various expressions, all of which are 48x48 pixels in size. Angry, Fear, Happy, Sad, Surprise, and Neutral are the six categories of emotions we use.
Fig.2. Sample Images of FER2013 dataset Convolutional Neural Network Convolutional Neural networks(CNN): Which takes input as an image and processes it using deep learning algorithm.The Convolutional neural networks consists of multiple layers like artificial neurons. CNN has three levels in its architecture.`Convolutional layer, pooling layer, and fully connected layer are the three. Basic characteristics such as horizontal and diagonal edges are extracted using the first layer. The output of the first layer is passed on to the next layer, which recognises more complex features like corners and combinational edges.The Pooling layer can also be used to minimize the Convolved Features spatial size in order to minimize the processing resources required to directly analyze data by decreasing the dimensions.These neurons which are in the last, entirely linked layer are completely coupled to all of the neurons in the previous and subsequent levels. This layer also helps with representation mapping between input and output. Building Network This model type used here is sequential. Sequential is a basic model construction strategy in Keras. It allows us to build a model layer by layer. The 'add()' method aids in the addition of layers to the model. Conv2D layers make up our initial two layers. These are convolution layers that will change the appearance of our input images. The next layer is Batch normalization. Batch
Advances in Science and Technology Vol. 124
223
normalization is a network layer` that allows individual layers to learn at their own speed. It's` used to normalize the prior layers'` output. The ReLU Layer is the next layer to be added. ReLU(Rectified linear activation function) is a piecewise linear function that, if the input is positive, outputs it directly; else, the output is zero. The next layer is the pooling layer and we will be using max pooling for it.`Max Pooling`is a pooling technique`that picks the element`with the highest value from the region of the feature` map covered by`the filter. The following output is a map with the most significant features from the previous feature map. Here we have used the Max polling technique because it is used to downscale the image and replace it with the convolution to get the most important features of it. It is also used to extract the max value from the feature map based on the filter size and strides. A 'Flatten' layer is also available. The flatten layer serves as a link between thick layers and convolution. 'Dense' is the next layer type we'll use in our output layer. The last layer is dense, which is a typical layer type for neural networks that is used in a variety of contexts. ‘Softmax ’ is an activation function used in the Dense layer that predicts the multinomial probability distribution. The Fig.3 depicts the flow of the layers of the network created.
Fig.3. Flow Diagram of Building Network Compiling and Training the Network The optimizer, loss, and metrics parameters are all used to compile the model. For our loss function, we'll employ ‘categorical crossentropy’. This is frequently the most popular classification method. A lower score means the model is doing better. For faster computation time the Adam optimizer is used here and it also requires only fewer parameters for tuning.
224
IoT, Cloud and Data Science
Methodology
Fig.4. Use Case Diagram for User Fig.4 Represent the activities that a user can do on the website. The User can Upload or delete their own notes, create or cancel their ordered books, return the earlierly ordered book, View news from the latest fields, View or Rate or Like a Book they see, Update their profile details, Search a book and they can see the book recommended by the facial expression recognition system according to the mood which was captured from the user’s face. The Captured User’s photo will be converted and stored in the database in a binary data form. The API here acts as an intermediate between the database and the facial expression predicting system, where it provides the User’s Image (binary data) in a JSON format. The Flask Application which works in the backend decodes the binary data back to the User’s image. The User’s facial expression is then predicted from the User’s image with the help of the Facial expression recognition model created.
Fig.5. Use Case Diagram for Admin
Advances in Science and Technology Vol. 124
225
Fig.5 Represent the responsibilities of an admin. The admin can Add or Delete a book, update an existing book details, Search a book, Update their profile details and View or Update an existing order which they receive. Results
Fig.6. Training`&`Validation`Loss`of the`model
Fig.7. Training &`Validation`Accuracy`of`the model` The Training Loss and Validation Loss of the model are depicted in Fig.6, and validation and training Accuracy in Fig.7 depicts the model's accuracy. Table 3.``Hyperparameters and its value. HYPER PARAMETER NAME EPOCHS BATCH SIZE VALIDATION DATA
VALUE 100 64 0.2
Table.3 depicts the hyperparameters used and its respective values. Epochs is a hyperparameter that signifies the neural network is trained for one cycle using all of the training data. Batch size is another hyperparameter parameter which defines the number of samples that will be propagated through the network. The last hyperparameter is validation data where 20% of the data from the dataset will go for testing.Training Accuracy of the model is nearly 99%. and the Testing Accuracy of the model is nearly 60%.
226
IoT, Cloud and Data Science
Fig.8. Heatmap of the Confusion matrix Fig.8 depicts the heatmap of the Confusion matrix after testing, which has been calculated by comparing the actual target values to the machine learning model's predictions. As you can see from the heatmap, there are different colors shown meaning the different states of the results. Green represents the positive results whereas Red indicates negative results. The darker the green color represents the more positive results. Requirements Analysis Table 4. Details of Software OPERATING SYSTEM
WINDOWS 10
PROGRAMMING LANGUAGES
PYTHON, PHP
WEB FRAMEWORK
LARAVEL 8 [7]
DATABASE
POSTGRESQL
Table 5. Details of Hardware PROCESSOR
INTEL I7
HDD
1TB
RAM
8GB
Conclusion Facial Expression based Book recommendation system for digital libraries are done using machine learning with the help of Convolutional Neural Network algorithm. The facial expression recognition is done using the FER2013 dataset. A novel approach on finding the emotion by decoding an image from the binary data. The user will be redirected to a particular page where books are displayed according to the user’s emotion. The proposed system also has additional features like a student uploading notes, so that every other student can view it and also can add notes of their own. Daily News is also provided to update the user’s knowledge. Admins are redirected to the admin page once they login using their Admin User ID, where they can manage the orders and books.
Advances in Science and Technology Vol. 124
227
References [1]
Yingruo Fan, Victor O.K. Li, Jacqueline C.K. Lam. IEEE, and Jacqueline C.K. Lam, ”Facial Expression Recognition with Deeply-Supervised Attention Network”, in IEEE Transactions on Affective Computing, 2020.
[2]
Yingjian Li, Guangming Lu, Jinxing Li, Zheng Zhang, David Zhang. “Facial Expression Recognition in the Wild Using Multi-level Features and Attention Mechanisms”, in IEEE Transactions on Affective Computing, 2020.
[3]
Florence, S & Uma, M. "Emotional Detection and Music Recommendation System based on User Facial Expression," IOP Conference Series: Materials Science and Engineering, 2020.
[4]
Chandra Bhushan Singh , Babu Sarkar , Pushpendra Yadav. “Facial Expression Recognition”, 2021.
[5]
K. Kulkarni."Automatic Recognition of Facial Displays of Unfelt Emotions," in IEEE Transactions on Affective Computing”, 2020.
[6]
Hao Meng,Fei Yuan,Yue Wu, and Tianhao Yan.”Facial Expression Recognition Algorithm Based on Fusion of Transformed Multilevel Features and Improved Weighted Voting SVM”.Hindawi Mathematical Problems in Engineering Volume 2021, 2021.
[7]
N. Yadav, D. S. Rajpoot and S. K. Dhakad, "LARAVEL: A PHP Framework for ECommerce Website," Fifth International Conference on Image Information Processing (ICIIP), 2019.
[8]
K. Puritat and K. Intawong, "Development of an Open Source Automated Library System with Book Recommendation System for Small Libraries," 2020.
[9]
K. Liu, C. Hsu, W. Wang and H. Chiang, "Real-Time Facial Expression Recognition Based on CNN,"International Conference on System Science and Engineering (ICSSE), 2019.
[10] S. Kim, G. H. An and S. -J. Kang, "Facial expression recognition system using machine learning," 2017.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 228-235 doi:10.4028/p-3g2p6g © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-12 Accepted: 2022-09-16 Online: 2023-02-27
Armament Detection Using Deep Learning Rohan George Jose1,a* and Rithvik Rajmohan2,b* 1,2Computer
Science Engineering, SRM Institute of Science and Technology, Chennai, India [email protected], [email protected]
Keywords - Computer Vision, Weapon detection, YOLO-V4, Artificial Intelligence (AI)
Abstract- In the recent past there has been an increase in the occurrence of violent incidents involving dangerous objects such as arms and knives. Being able to quickly identify and defuse such situations are of utmost importance in order to preserve peace and to avoid human casualties. One of the most important and commonly used methods to increase security is the usage of surveillance cameras almost everywhere. The benefit of object detection techniques can be used in this field in order to help improve security. Using object detection techniques in order to detect objects of interest in surveillance footage is one method to identify dangerous situations and take necessary steps in order to minimise any damages.This paper uses convolutional neural network (CNN) based YOLO algorithm in its implementation to detect weapons such as knives and pistols I.
Introduction
Armament detection is the detection and identification of objects that can be considered dangerous. The proposed project is able to recognize objects by making use of feature extraction and learning algorithms. This implementation focuses on accurate weapon detection and avoiding false detection as it can cause unnecessary panic. The flow of work during implementation is shown in figure 1. The YOLO-V4 algorithm possesses the right balance between speed and accuracy which allows it to detect the weapon quickly with high accuracy
Fig 1. Flow of work
Advances in Science and Technology Vol. 124
II.
229
Implementation
A.
Resources or components used for implementation OpenCV 4.5.5- Open source computer vision library version 4.5.5. Python 3.10.2- High level programming language used for various image-processing applications. ● Guns and knives Dataset- A dataset consisting of pistols and knives with respective labels ● Colab and CSPdarknet53 ● GPU: NVIDIA Tesla P100-PCIE-16GB B. Dataset Specifications ● ●
Case 1:Dataset with 1000 images and single class ● GPU: NVIDIA Tesla P100-PCIE-16GB ● Size of input image- 416 x 416 px ● Format of images used- .jpg ● Self obtained dataset ● Classes contained - 1 (Pistol) ● Number of images - 1000 Case 2: Dataset with 1000 images and two classes ● ● ● ● ● ●
GPU: NVIDIA Tesla P100-PCIE-16GB Size of input image- 416 x 416 px Format of images used- .jpg Self obtained dataset Classes contained - 2 (Pistol, Knife) Number of images - 1000
Case 3: Dataset with 4000 images and two classes GPU: NVIDIA Tesla P100-PCIE-16GB Size of input image- 416 x 416 px Format of images used- .jpg Self obtained dataset Classes contained - 2 (Pistol, Knife) Number of images - 4000 C. Assumptions made during implementation of model ● The Knife or Pistol should be exposed and within line of sight of the camera ● There should be enough lighting to be able to detect the Pistol and Knife ● A decent GPU is used in order to prevent any lag during weapon detection. ● The model will not be fully automated. It will require a person in charge in order to verify each weapon detection D. YOLO- V4 ● ● ● ● ● ●
Fig 2. YOLO-V4 layers
230
IoT, Cloud and Data Science
YOLO-V4 is built with CSPDarknet53 as its backbone, it has SPP (Spatial Pyramid Pooling) and PAN (Pan Aggregation Network) as its “Neck” and has its predecessor YOLO-V3 as its “Head”. It boasts an increase in performance compared to its predecessor and has significantly higher mAP (Mean Average Precision)
Fig 3. Block diagram of YOLO-V4 Object detection Dataset Creation and Training: Images are collected from different datasets and merged together to form a new dataset which contains a total of 4000 images with an equal number of images for the two classes used. All images were uploaded to Roboflow and annotated. The images were then labelled using Roboflow Annotate. The dataset was then split in a format such that 70% of the total images were used for the Training set and 20% for Validation set and 10% for Testing Set. The newly created Guns and knives dataset is trained on Colab using the You Only Look Once (YOLO-V4) model and was run for 4000 iterations/steps on the model in order to increase the accuracy and precision. Figure 4 shows the folder containing the train, valid and test images.
Fig.4. Folder with valid, train and test images
Fig.5.Images along with its labels
Advances in Science and Technology Vol. 124
231
Fig 6. Training of YOLO-V4 After labelling the dataset and choosing the desired preprocessing and augmentation options we can generate a new dataset version and use the Curl link to bring it into the Colab notebook for training.
Fig 7.Weights files obtained after training After every 1000 iterations a new weights file is obtained and when a new high mAP was found it generates a new yolo v4 ‘best.weights’ file
Fig 8. Results after training containing mAP, recall, F1 score Training of the model took around 7 hours to complete and the final results of the training are shown above in figure 8. After training the model has achieved a mAP of 0.93093 or 93.09% along with recall 0.92 and also a F1score of 0.92. The F-1 score is obtained by taking the harmonic mean of the precision and recall of the model. An average precision of 94.40% was seen for the knife class and an average precision of 91.79 was seen for the pistol class.
232
IoT, Cloud and Data Science
Pseudo code for YOLO
Fig 9. Pseudo code for YOLO-V4 implementation
Fig 10. Graph for plotting mAP and Average loss. Figure 10 shows a graph which plots the increase in mAP throughout the training of the code during each iteration as well as the decrease in average loss after each iteration. III. Results and Analysis The model was trained with different inputs in order to experiment and find the best accuracy. A. Weapon detection using the YOLO v4 Algorithm Case 1: Dataset with 1000 images and single class The model was trained using a dataset containing only a single class. The dataset contained 1000 images of pistols that had been annotated. It was then trained for 2000 iterations. Training of a single class dataset took around 3 hours to complete. It achieved a mAP of 98.24%. Case 2:Dataset with 1000 images and two classes The model was then trained using a dataset that contained two classes, namely knife and pistol. It contained 1000 images and was split evenly between the two classes ie. 500 images of knife and 500 images of pistol. It was then trained for 4000 iterations which took around 4 hours to complete. A mAP of only 77.58% was obtained when using this set of inputs. The reason for the low accuracy was surmised to be due to the lack of sufficient data in each class of the dataset.
Advances in Science and Technology Vol. 124
233
Case 3:Dataset with 4000 images and two classes The model was then finally trained using a dataset that contained two classes but had an increased number of images at 4000 images. The 4000 images were evenly split into 2000 images of knives and 2000 images of pistols. It was trained for 4000 iterations and took close to 7 hours. This batch of training gave the best results as it contained sufficient data of each class and an accuracy of 93.09% B. Output after detection using YOLO-V4
234
IoT, Cloud and Data Science
IV. Conclusion The YOLO-V4 algorithm is simulated for three versions of the dataset, namely a dataset using 1000 images and singles class, 1000 images and two classes and 4000 images and two classes. The YOLOV4 algorithm has a good balance between speed and accuracy so it is ideal for real time detection. In regards to accuracy and results, a mAP of 93.09% was obtained along with a recall value of 0.92 and an F1 score of 0.92.This model can be implemented on larger datasets by using more powerful GPUs during the training process. References [1]
H. Jain, A. Vikram, Mohana, A. Kashyap and A. Jain, "Weapon Detection using Artificial Intelligence and Deep Learning for Security Applications," 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), 2020, pp. 193-198, doi: 10.1109/ICESC48915.2020.9155832.
[2]
M. T. Bhatti, M. G. Khan, M. Aslam and M. J. Fiaz, "Weapon Detection in Real-Time CCTV Videos Using Deep Learning," in IEEE Access, vol. 9, pp. 34366-34382, 2021, doi: 10.1109/ACCESS.2021.3059170.
[3]
Handgun Detection Using Combined Human Pose and Weapon Appearance Jesus RuizSantaquiteria; Alberto Velasco-Mata;Noelia Vallez;Gloria Bueno;Juan A. ÁlvarezGarcía;Oscar Deniz IEEE Access
[4]
Mery, Domingo; Svec, Erick; Arias, Marco; Riffo, Vladimir; Saavedra, Jose M.; Banerjee, Sandipan (2017). Modern Computer Vision Techniques for X-Ray Testing in Baggage Inspection. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47(4), 682–692. doi:10.1109/TSMC.2016.2628381
[5]
A. Egiazarov, V. Mavroeidis, F. M. Zennaro and K. Vishi, "Firearm Detection and Segmentation Using an Ensemble of Semantic Neural Networks," 2019 European Intelligence and Security Informatics Conference (EISIC), 2019, pp. 70-77, doi: 10.1109/EISIC49498.2019.9108871.
[6]
Wei Liu et al., “SSD: Single Shot MultiBox Detector”, European Conference on Conputer Vision, Volume 169, pp 20-31 Sep. 2017.
[7]
D. Erhan et al., “Scalable Object Detection Using Deep Neural Networks,” IEEE Conference on Computer Vision and Pattern Recognition(CVPR),2014.
[8]
Ruben J Franklin et.al., “Anomaly Detection in Videos for Video Surveillance Applications Using Neural Networks,” International Conference on Inventive Systems and Control,2020.
[9]
H R Rohit et.al., “A Review of Artificial Intelligence Methods for Data Science and Data Analytics: Applications and Research Challenges,”2018 2nd International Conference on ISMAC (IoT in Social, Mobile, Analytics and Cloud), 2018
[10] Abhiraj Biswas et. al., “Classification of Objects in Video Records using Neural Network Framework,” International conference on Smart Systems and Inventive Technology,2018. [11] Pallavi Raj et. al.,“Simulation and Performance Analysis of Feature Extraction and Matching Algorithms for Image Processing Applications” IEEE International Conference on Intelligent Sustainable Systems,2019. [12] Mohana et.al., “Simulation of Object Detection Algorithms for Video Survillance Applications”, International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), 2018.
Advances in Science and Technology Vol. 124
235
[13] Yojan Chitkara et. al.,“Background Modelling techniques for foreground detection and Tracking using Gaussian Mixture model” International Conference on Computing Methodologies and Communication,2019. [14] Rubner et.al, “A metric for distributions with applications to image databases”, International Conference on Computer Vision,2016. [15] N. Jain et.al., “Performance Analysis of Object Detection and Tracking Algorithms for Traffic Surveillance Applications using Neural Networks,” 2019 Third International conference on ISMAC (IoT in Social, Mobile, Analytics and Cloud), 2019. [16] A. Glowacz et.al., “Visual Detection of Knives in Security Applications using Active Appearance Model”,Multimedia Tools Applications, 2015. [17] S. Pankanti et.al.,“Robust abandoned object detection using regionlevel analysis,”International Conference on Image Processing,2011. [18] Ayush Jain et.al.,“Survey on Edge Computing - Key Technology in Retail Industry” International Conference on Intelligent Computing and Control Systems,2019. [19] Mohana et.al., Performance Evaluation of Background Modeling Methods for Object Detection and Tracking,” International Conference on Inventive Systems and Control,2020. [20] J. Wang et.al., “Detecting static objects in busy scenes”, Technical Report TR99-1730, Department of Computer Science, Cornell University, 2014. [21] V. P. Korakoppa et.al., “An area efficient FPGA implementation of moving object detection and face detection using adaptive threshold method,” International Conference on Recent Trends in Electronics, Information & Communication Technology,2017. [22] S. K. Mankani et.al., “Real-time implementation of object detection and tracking on DSP for video surveillance applications,”International Conference on Recent Trends in Electronics, Information & Communication Technology,2016.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 236-245 doi:10.4028/p-vph2n1 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-13 Accepted: 2022-09-16 Online: 2023-02-27
Tomato Leaf Disease Detection Using Deep Convolution Neural Network Muhammad Arafath1,a, A. Alice Nithya2,b* and Sanyam Gijwani3,c Department of Computer Science and Engineering, SRM Institute of Science and Technology
1,3
Chengalpattu, India Department of Computational Intelligence, SRM Institute of Science and Technology, Chengalpattu, India
2
b*
[email protected]
Keywords: Leaf disease detection, Deep Convolution Neural Network, Drop out, Batch Normalization, Tomato Leaf Disease.
Abstract. Plants are a major and important food source for the world's population. Smart and sustainable agriculture should be capable of providing precise and timely diagnosis of plant diseases. This helps in preventing financial and other resource loss to the farmers. Since plant diseases show visible symptoms which a plant pathologist will be able to diagnose through optical observations. But this process is slow and requires continuous monitoring as well as the availability and successful diagnostic capability of the pathologist. To overcome this, in smart agriculture, computer-aided plant disease diagnostic/detection model is used to help increased crop yield production. Common diseases are found in tomatoes, potatoes and pepper plants, some of them are bacterial spot, early blight etc. If a farmer can detect these diseases early, and can apply an appropriate treatment then it will improve crop yield and prevent the economic loss. In this work, we train the dataset on three different deep convolution neural network architecture and found the best suitable model to detect tomato leaf diseases. In order to avoid overfitting of the mode, batch normalization layer and a drop out layer has been included. The proposed Deep CNN is trained with various dropout values and a suitable dropout value is identified to regularize the model. The experimental methodology tested on plant village dataset showed improved accuracy of 96%, even without performing pre-processing steps like noise removal. By introducing batch normalization and dropout layer training accuracy improved to 99% whereas validation and testing accuracy is found to be 98%. Introduction In India, agriculture is the back bone of its economy and 58% of Indian population is dependent on agriculture only Most Living beings are directly or indirectly dependent on plants for their survival. Diseases in plant leads to loss of crops which could destroy the food chain. This could only be solved by physical monitoring of plants. Manual examination of plants is a labour-intensive process and chances of getting an error is also very high. Huge loss occurs when plants are affected with diseases which obstruct the growth of crops. To detect diseases, pests like the germ, fungus, and microorganisms which are being the main cause for the crops failure also result in huge loss to the farmer. Currently image processing-based techniques are being used to detect plant or crop diseases and in COVID-19 pandemic situation, AI was widely used for the diagnosing lung-associated diseases along with other analytical applications. Like-wise advanced technologies could be used to diminish the negative effects of plant diseases by identifying and diagnosing them at a primary level. It is very labor-intensive and time-consuming to monitor plant diseases manually. Therefore, usage of AI based methods to diagnose plant diseases are currently being extensively studied. It is very necessary to detect disease and recognize its effect on the crop. There are various algorithms in image processing like SVM, Random Forest, KNN, CNN and Artificial Neural Network to recognize diseases by image classification [1]. CNN was able to solve a major problem faced in image classification algorithms like face recognition previously which need to have attention at where the face is placed in an image. In CNN, features of an image are deeply processed at each layer. Each
Advances in Science and Technology Vol. 124
237
layer of the convolution network will extract different features of each diseased image. The main aim of this application is to develop a system for the farmers and agriculture experts to recognize crop diseases at an earlier stage [2,3]. In this the user just need to click an image and upload on the system, Image processing begins with the digitalized color image of the unhealthy leaf. Finally, we can detect diseases in plant in leaves using CNN. Literature Review Leaf disease in a plant could cause damage to the entire batch of crops. Though experienced farmers could identify these diseases, diagnosing them at the early stage helps in stopping and further spread of the disease. Machine learning and deep learning techniques attempt to mitigate this issue and help in early diagnosis of the disease. Several researches have been focused on developing early disease diagnosis system. Muhammad E. H. Chowdhury [2] developed a deep learning architecture based on the latest method of convolutional neural network called Efficient Net which is able to identify and classify leaf diseases. This model was trained on 18,161 plain and segmented tomato leaves images to categorized diseases in tomato plant. The result obtained after training proved that their model had got improved results. The proposed solution could be used on a real-time application where microcontrollers enabled with cameras can be used for checking performance. DeepaLakshmi et al. [3] discussed about precision in new technology to help farmers to increase more production by introducing new techniques. Detection of Pest, weed leaf disease were some of the best ideas provided to farmers in this work. Their model used an average time of 3.8 seconds to identify the image class with further accuracy of 94.5%. Umit Atila et al. [4] discussed about the usage of many other techniques based on visual computing. In this study the model was trained, with plant village dataset containing 54305 images and 38 classes of 14 different types of plant species. Many different visual computing techniques had been introduced to predict plant leaves diseases in its earlier stage. To control the issue, they used a novel hybrid approach carried out in three forms. Md. Tariqul Islam [5] discussed about the use of cultivator and non-scientific techniques were been used for detecting diseases and crop harvesting. Trained model used for testing section had 103 images for Foliar fungal diseases, Rust disease had 239 images, gray leaf spot had 239 images, Common rust corn had 233 images. They used image processing-based methods and CNN model for training and the model resulted in providing 94.20% accuracy. Parul Sharma et al. [6] discussed the use of Convolutional neural network (CNN) models for plant disease detection. In the given trained CNN model, the dataset involved had up to 3000 images for training purpose. F-CNN model was used to train the complete images and S-CNN model was trained for segmented images and got an improved accuracy of 98.6%. Sunku Rohan et al. [7] deliberated the utilization of transfer learning technique which includes the use of the gained knowledge in a problem for training in any other field or task. There is total 54305 images and 38 classes of 14 dissimilar plant species in the plant village dataset, out of which 12 are healthy and 26 of which were unhealthy and diseased were used in model training. The principal aim of this study was to find out the efficiency of Efficient Net deep learning architecture in the detection of different plant leave disease and to differentiate it with the performances of advanced CNN models in the literature. Abeer A et al. [8] developed a CNN architecture to classify four different types of potato leaf diseases by training a publicly available dataset of 2400 images of potato leaves. The authors used techniques of deep convolutional neural network to find about 4 different types (Red, Sweet, Red Washed and White) of potato diseases. The model was able to achieve accuracy of 99.5% which could explain the viability of the method. Shantanu Kumbhar et al. [9] compared different classification algorithms As a result of the comparison study they observed that CNN could provide higher accuracy for predicting plant diseases. Models trained for Cotton leaf diseases mainly focuses on two diseases Alternaria Macrospora and Bacterial Blight. The main objective of this paper was to provide easy to use web
238
IoT, Cloud and Data Science
system for farmers and experts to recognize disease of the crop. In this system user just need to upload image on system, CNN will detect the cotton plant disease and detects the disease type. Santhanalakshmi et al. [10]. discoursed about plenty of machine learning techniques that could be majorly used to diagnose diseased plants. The model proposed to get estimation of the health of potato with datasets of leaf image based Neural Network. The production is reduced due to loss of food by infection to crop. One of the challenges in farming is to reduce pesticides in agriculture. Karthik et al. [11] enhanced deep learning architecture developed for automatic detection of diseases in tomato leaves. For validation purposes 24001 images were used and 95999 images were used for training the model. It concludes two different types of deep architectures for detecting type of diseases in tomato leaf. The first architecture learns significant features for classification by applying residual learning. Hari Krishnan et al. [12] discussed about efficient estimation of market and on major misfortune on whole generation of yield. They identified that tremendous work was required to train model to recognize leaf infection, plant sicknesses information and required more time to prepare the data. Identifiable proofs of disorder follow the means like piling the picture, differentiate elevation, extricating of highlights, moving over RGB to HIS and SVM. Serawork Wallelegin et al. [13] designed the LeNet architecture-based model for soybean plant disease classification. 12,673 samples of leaf images were taken and classified into four classes, were obtained from the Plant Village database to train model. Convolutional neural networks (CNN) shown tremendous ability in object recognition and image classification problems. This paper shows the viability of CNN to find disease in plant by taking leaf images taken in the normal condition. Melike Sardogan et al. [14] proposed a CNN model algorithm based on Learning Vector Quantization (LVQ) method to detect and identify Plant diseases. 500 images of tomato leaves wre trained for the model dataset which has four indicators of diseases. for automatic feature extraction and classification, they have a CNN model. Three channels were applied with filters based on RGB components. The LVQ equipped with output function vector of convolution part for exercising the network. Xihai Zhang et al. [15] proposed a method using GoogleNet model to recognize leaf disease in this paper. A total of 500 images were taken from Plant Village and Google websites to train the model of diseases in maize leaf, and is divided into nine categories. A high accuracy of 98.9% and 98.8%. could be achieved by two improved deep CNN models. By experiment it was proved that it was possible to enhance recognition accuracy by increasing the diversity of pooling,by including many changes of the model limitations and addition of a Relu function. Dor Oppenheim et al. [16] developed a deep convolutional neural network, which was trained to classify the tubers into five categories: a healthy potato class and four disease classes. Model was trained to study images of the database, containing potato tubers of dissimilar cultivars, diseases and sizes and was developed, classified, and labeled physically by professionals. Standard low-cost RGB were used to test the models over a data set of images taken and were marked by specialists, indicating high classification accuracy. This article shows the successful application of deep convolutional networks. From the survey made it has been observed that even though several models have been proposed to detect plant diseases, a farmer-friendly intelligent mobile app or web app is still in need. Hence in this work a mobile-based farmer-friendly tomato leaf disease detection model is proposed. Experimental Methodology In this work, mobile based tomato leaf disease detection application is proposed using deep convolution neural network architecture. As shown in fig. 1 there is a database which consist of images of different types of plant leaf diseases, from which we have taken tomato plant with diseased and heathy leaves have been used. The module is trained repetitively to attain the maximum accuracy. If a new image is given to the module, it’s features get compared with the features that are already trained in the database. It then provides the appropriate result. In this work, the whole process is given in four phases.
Advances in Science and Technology Vol. 124
239
Fig. 1 Proposed System Architecture Diagram 1) Phase 1: Image Acquisition. 2) Phase-2: Pre-processing of the raw data: This step serves as an essential part which contributes towards the accuracy and reliability of the model. For a Deep learning paradigm to work well, increased availability of input data is vital. Data Augmentation is done in this step. 3) Phase-3: Deployment to Google Cloud Platform / Backend Server. 4) Phase-4: Design of smart, farmer-friendly Mobile Application that uses the CNN model from the cloud function. In this study of research 3900 images belonging to 6 different classes of tomato disease available in Plant Village open dataset [17] was taken. Out of the 6 classes, 5 classes are different types of tomato leaf diseases and 1 class belonging to healthy tomato plant. The dataset of 3900 images were split into training dataset, validation dataset and testing dataset where 70% of the dataset is given to the training dataset and the rest 30% is split between validation and testing dataset. Fig. 2 shows sample images taken from the dataset.
Fig. 2 Sample Image Taken from Plant Village Dataset
240
IoT, Cloud and Data Science
After splitting the dataset and before proceeding to building the model, the images should be resized. Moreover, to improve the model performance normalization of image pixels should be done so as to keep them in range of 0 and 1. Then the input images were resized to (256,256), which will be efficient for the model while predicting. Data augmentation was performed to improve the performance of the outcome model by increasing the size of dataset by methods like random flip and random rotations. This is then given to a deep convolutional neural network architecture. In the existing methodology simple convolutional neural network architecture was used to classify the input images which would require more processing time and the efficiency of the model on test data was found to be low. The proposed model is trained a using a deep convolution neural network from the scratch by finetuning suitable hyperparameters to improve the accuracy. Batch normalization layer and a dropout-based regularization techniques have been introduced in this architecture to achieve model generalization. Batch Normalization layer allows the other layers of the network to perform more independent learning. It normalizes the output of the incoming layers. It is generally used when the testing accuracy of the trained model gets very low compared to the accuracy of the model achieved by training i,e training accuracy. Dropout technique is the regularization method that is used when there is overfitting in the model. Dropouts are added so that it randomly blocks some percent of neurons in the network by blocking the input and output connection to those neurons. It is very much used in overfitting, as it enhances the learning ability of the model. Fig. 3 shows the proposed deep convolution neural network architecture.
Fig. 3 Proposed Deep Convolution Neural Network Architecture Results and Discussion The proposed system will be beneficial for farmers and common people enthusiastic to get into farming or gardening, to support in early leaf disease detection and to take necessary safety measures to thwart it from distribution and achieving the anticipated yield. The best performance of the model was estimated by training using different deep CNN architectures namely B0, B1 and B2 respectively. In B0, 6 convolution layers were defined, B1 4 convolution layers were described and in B2 3 convolution layers were used. Among the three models B2 with 3 convolution layers was found to achieve a model performance of 96%. Table 1 shows the comparison of 3 different deep CNN architectures. But while testing the model, the accuracy was not as expected, indicating that the training could have resulted in overfitting.
Advances in Science and Technology Vol. 124
241
Table 1 Comparison of Three Different Deep CNN Architectures Model
Layers
Training Accuracy
B0
6 conv2D layers
91%
B1
4 conv2D layers
94%
B2
3 conv2D layers
96%
In order to avoid overfitting of the developed model B2, batch normalization and a dropout layer was added into the proposed 3 layered CNN architecture. These layers aid in regularizing the proposed model. The model was tested with different drop out values and a suitable drop out value was identified. Table 2 shows the results of model tested with various dropout values and the accuracies. Table 2 Model B2 Tested with Various Dropout Values Dropout
Training Validation Testing Accuracy Accuracy Accuracy
0.2
97%
98%
97%
0.25
99%
98%
98%
0.3
98%
97%
97%
0.35
98%
96%
95%
0.4
97%
96%
96%
0.45
96%
95%
93%
0.5
96%
94%
91%
So, it was found out from the above table, after applying different drop out values on model B2, dropout value of 0.25 gives maximum training, validation and testing accuracy. Table 3 shows a comparison of the existing CNN model (B2) and the proposed Deep CNN architecture with batch normalization and drop out. Table 3 Comparison of the existing CNN model (B2) and the proposed Deep CNN architecture Training Accuracy
Validation Accuracy
Testing Accuracy
Existing CNN Architecture (B2)
96%
81%
78%
Proposed Deep CNN (with Batch Normalization and Dropout)
99%
98%
98%
Fig. 4 shows the summary of model B2. The whole advantage of using CNN is that it can extract the spatial features from the data using its kernel, which other networks are unable to do. For example, CNN can detect edges, distribution of colors etc. in the image which makes these networks very robust in image classification and other similar data which contain spatial properties.
242
IoT, Cloud and Data Science
Fig. 4 Model B2 Summary Pooling layer is used to reduce the spatial volume of input image after convolution. Model is tolerant toward variations and distortions. It is used between two convolution layers. If it applies FC after Convolution layer without applying pooling or max pooling, then it will be computationally expensive. So, the max pooling is only way to reduce the spatial volume of input image. It has applied max pooling in single depth slice with Stride of 2. It can observe the 4 x 4, dimension input is reducing to 2 x 2 dimensions. CNN performance is evaluated through the validation data set. The model is improvised by fine tuning the activation functions and optimizers, resulting in a minimum loss. More effort is spent on this step to develop the best model suited for this problem. Thus, the proposed model helps in reduced data consumption: Data samples (images of the diseased plant) need not to be stored in the device. Training and Validation Accuracy attained maximum by the increase in the no of epochs. Similarly, Training and Validation loss got to minimum by the passage of time. Hence the CNN model is successfully built and trained. Fig. 5 shows the training vs validation accuracy and loss of the proposed model B2.
Fig. 5 Training vs Validation Accuracy and Loss
Advances in Science and Technology Vol. 124
243
CNN models with and without batch normalization and dropout techniques were implemented and the most efficient model among them with batch normalization was put under various dropout values like 0.25,0.5 etc. and their testing accuracy and results were recorded. This is done to compare the efficiency of the trained model under dropout technique. Fig. 6 shows the precision, recall and f1-score values were found for each and every class in test dataset and were tabled below. Fig. 7 shows the confusion matrix plotted against the truth and predicted array of items.
Fig. 6 Precision, Recall and F1 Score Values for Every Classes
Fig. 7 Confusion Matrix Plotted for Various Tomato leaf Diseases Conclusion The proposed methodology uses a deep convolution neural network architecture to detect and predict the type of tomato leaf disease. To prevent the economic loss and to make the technology simple and user friendly for the farmers an online app will be made. This requires a smart phone and internet connection. With this app, farmers can detect the leaf disease earlier with maximum accuracy. Different deep convolution neural network architectures were developed from the scratch with 6, 4 and 3 convolution layers respectively and was found to give an accuracy of 91%, 94% and 96%
244
IoT, Cloud and Data Science
respectively. This showed that the model with 3 layers perform well for detecting tomato leaf diseases. But while testing it was observed that the model overfits. Hence different dropout values were tried on the proposed model and the best drop out value of 0.25 was identified. Thus, the proposed architecture provides a regularized model with improved accuracy of 99%. Finally, this study concludes how CNN is playing a vital role in the beneficial of small hold farmers. References [1] Nithya, A. Alice, and C. Lakshmi. "On the performance improvement of non-cooperative iris biometrics using segmentation and feature selection techniques." International Journal of Biometrics 11, no. 1 (2019): 1-21. [2] Chowdhury, M. E., Rahman, T., Khandakar, A., Ayari, M. A., Khan, A. U., Khan, M. S., ... & Ali, S. H. M. (2021). Automatic and Reliable Leaf Disease Detection Using Deep Learning Techniques. AgriEngineering, 3(2), 294-312. [3] Deepalakshmi, P., Lavanya, K., & Srinivasu, P. N. (2021). Plant Leaf Disease Detection Using CNN Algorithm. International Journal of Information System Modeling and Design (IJISMD), 12(1), 1-21. [4] Atila, Ü., Uçar, M., Akyol, K., & Uçar, E. (2021). Plant leaf disease classification using EfficientNet deep learning model. Ecological Informatics, 61, 101182. [5] Islam, M. T. (2020). Plant Disease Detection using CNN Model and Image Processing. INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume, 9. [6] Sharma, P., Berwal, Y. P. S., & Ghai, W. (2020). Performance analysis of deep learning CNN models for disease detection in plants using image segmentation. Information Processing in Agriculture, 7(4), 566-574. [7] Suma, V., Shetty, R. A., Tated, R. F., Rohan, S., & Pujar, T. S. (2019, June). CNN based leaf disease identification and remedy recommendation system. In 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA) (pp. 395-399). IEEE. [8] Elsharif, A. A., Dheir, I. M., Mettleq, A. S. A., & Abu-Naser, S. S. (2020). Potato Classification Using Deep Learning. International Journal of Academic Pedagogical Research (IJAPR), 3(12). [9] Kumbhar, S., Nilawar, A., Patil, S., Mahalakshmi, B., & Nipane, M. (2019). Farmer buddy-web based cotton leaf disease detection using CNN. Int. J. Appl. Eng. Res, 14(11), 2662-2666. [10] S.T.Santhanalakshmi, S.Rohini, M.Padmashree (2014) PLANTS DISEASE DETECTION USING CNN . Journal of Emerging Technologies and Innovative Research (JETIR) [11] Karthik, R., Hariharan, M., Anand, S., Mathikshara, P., Johnson, A., & Menaka, R. (2020). Attention embedded residual CNN for disease detection in tomato leaves. Applied Soft Computing, 86, 105933. [12] Krishnan, H., Priyadharshini, K., Gowsic, M., Mahesh, N., Vijayananth, S., & Sudhakar, P. (2019, March). Plant disease analysis using image processing in MATLAB. In 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN) (pp. 13). IEEE. [13] Wallelign, S., Polceanu, M., & Buche, C. (2018, May). Soybean plant disease identification using convolutional neural network. In The thirty-first international flairs conference.
Advances in Science and Technology Vol. 124
245
[14] Sardogan, M., Tuncer, A., & Ozen, Y. (2018, September). Plant leaf disease detection and classification based on CNN with LVQ algorithm. In 2018 3rd International Conference on Computer Science and Engineering (UBMK) (pp. 382-385). IEEE. [15] Zhang, X., Qiao, Y., Meng, F., Fan, C., & Zhang, M. (2018). Identification of maize leaf diseases using improved deep convolutional neural networks. IEEE Access, 6, 30370-30377. [16] Oppenheim, D., Shani, G., Erlich, O., & Tsror, L. (2019). Using deep learning for image-based potato tuber disease detection. Phytopathology, 109(6), 1083-1087. [17] Mohanty, S. P., Hughes, D. P., & Salathé, M. (2016). Using deep learning for image-based plant disease detection. Frontiers in plant science, 7, 1419.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 246-249 doi:10.4028/p-5vztxi © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-15 Accepted: 2022-09-16 Online: 2023-02-27
Self-Driving Car Using Supervised Learning Ryan Collins1,a*, Himanshu Kumar Maurya1,b and Ragul Raj S.R.2,c Computer Science Engineering [CSE] SRM Institute of Science and Technology Vadapalani, Chennai, India 1
Computer Science Engineering [CSE] SRM Institute of Science and Technology Vadapalani, Chennai, India 2
[email protected], [email protected], [email protected]
a
Keywords: self driving car, autonomous mode, car driving, computer vision, vehicle routing
Abstract. In the growing field of autonomous driving, minimizing human errors lead to fatal accidents. Many companies and research groups have been working for years to achieve a fully autonomous vehicle. Self-driving cars are inevitably the future. The system to be implemented is to train and directly translate images from three cameras to steering commands. The proposed method is expected to work on local roads with or without lane markings. With just the images generated and the human steering angle as the training signal, the proposed method is expected to automatically recognize important road characteristics. A simulator (Udacity) is used to generate data for the proposed model. Introduction Automobiles are one of the most developed fields in an ever-growing world where autonomous cars are being produced. This allows a broader range of people to collaborate on, analyze, and enhance the technologies utilized in today's autonomous vehicles. An autonomous car, for example, must use a variety of sensors to model the surrounding world in real-time. On a large scale, self-driving automobiles will reduce carbon dioxide emissions and improve fuel efficiency. It will also aid in preventing vehicle accidents caused by human mistakes. As a result, a self-driving automobile will have a more significant impact in the real world, where there is less traffic and more mobility. Data is generated from a simulator, and a convolutional neural network is used to predict the steering angles of a car while driving down the road. To forecast steering values, the proposed model employs a supervised learning approach. Proposed system Generation of data The test data was generated for 30 minutes using the open-source simulator Udacity. The dataset is generated by driving the car for 30 minutes in training mode in the simulator. The dataset is developed in CSV, which contains the vehicle's left, right, and center view and the directory of the image generator when driving a car in training mode. The data is read from the CSV file into the python notebook, where left, right, and center columns are kept, as they have the image directory, and other columns are dropped. The batch generator is used to generate batches of image data so that images can be sent into the neural for training. Image Processing The image is made bigger or smaller to adjust the steering angle. The image channel is converted from RGB to YUV. Unwanted details like the sky are cropped and resized. The image is flipped according to suit the steering angle.
Advances in Science and Technology Vol. 124
247
Some shadow is added or removed according to the image. In the same way, brightness is also adjusted. The image is shifted vertically or horizontally for the machine to produce better results when fed into the neural network. Neural Network Architecture The neural network model used is sequential. It contains nine 2D convolution neural network layers with reluas the activation function, fourmax-pooling layers, a 20% drop out layer, a flatten layer, four dense layers with activation function as relu, and a lambda layer to wrap arbitrary expressions. The output layer is a dense layer with one node which predicts the steering angle. The neural network architecture is inspired by VGG16 architecture. System Flow Architecture The data set used was generated from the Udacity car simulator. The data set contains a CSV file and image folder, where the image destination is saved in the CSV file. The images are imported using the CSV file. The images are saved in batches using a batch generator, as the images inputted have three perspective views (left, right, center). The images are then preprocessed and split into train and validation data sets. The train data set is used to train the neural network. The validation dataset is used to check whether the model is overfitted. The model is saved after training for ten epochs.
Fig. 1 System Flow Architechture result
248
IoT, Cloud and Data Science
Fig. 2 Representation of loss of 9th epoch
Fig. 3 Graph Representing Best Result We got the best result at 9th Epoch, Training loss = 0.0132 Validation loss = 0.0144 𝟏𝟏 � 𝒊𝒊 )𝟐𝟐 MSE (loss) = 𝒏𝒏 ∑𝒏𝒏𝒊𝒊=𝟏𝟏(𝒀𝒀𝒊𝒊 − 𝒀𝒀 Literature References
1] They developed a model that captures known correlations between many variables to identify the system-level consequences of vehicle automation. They investigated how this model will change in several situations of automotive vehicle implementation using qualitative interviews and workshops. The author's models and scenarios have shown that there are a lot of variables that can impact the transportation system. Aerial vehicles are a promising technology, and they are still in their early stages of development. [2] The author finds that accurate, high-resolution lidar measurements are instrumental in coastal environments, where the topography is often flat, and slight elevation changes are frequently significant. According to the author, a LIDAR is a device that sends out timed pulses of laser light and checks the time between emission and reception of the pulses to find the object's distance from the sensor. [3] They have proposed a model that uses reinforcement learning to solve routing problems. The algorithms are used to juxtapose the outcomes of the reinforcement learning method with two other approaches: one that uses a mixed-integer linear program formulation, and the other uses genetic algorithms. They presented an RL technique based on administered values to estimate and centralized vehicle mapping, which is well-suited for capacity coninements. The solution is built by mapping automobiles to clients.
Advances in Science and Technology Vol. 124
249
[4] Using deep learning methods, the authors converted pixels from a sin_x0002_gle front-facing camera to steering angle input. The algorithm works flawlessly with minimal human training data as it uses convolutional neural networks. The robustness of this end-to-end method was surprising. For training, they used an NVIDIA DevBox with Torch 7, and for selecting where to drive, they used an NVIDIA DRIVETM PX physical module with Torch 7. The technology works at a frame rate of 30 frames per second (FPS). [5] In the paper, a three-wheel Robot prototype has been created to communicate with Google Maps API on a modest scale, taking into account the various characteristics and costs, reaching the destination, and identifying impediments to travel steadily automatically. The physical module of the robot is equipped with several sensors that allow it to communicate with the API. The Mobile Robot uses the GPRS Module to connect directly to Google Maps API, acquire directions, and move in that direction. [6] They constructed an RC car that can drive around a track autonomously in their paper. An RGBD camera is used for fetching images, and OpenCV, is an open-source software library for computer vision. YOLOv3 handles object recognition and classification. The real-time coordinates of the RC car are fetched using IPS (Indoor Positioning System). They built five almost identical vehicles, which let us evaluate the car's capacity to interact with other automobiles while traveling around the course. [7] The authors created a 3D virtual metropolis that depicts the real world. They used the YOLOv3 object detection algorithm to locate things in our environment. Their self-driving car's primary goal is to traverse the city safely and quickly. [8] In this study, they used a genetic algorithm. Their method enables mobile robots to navigate on their own. The model identifies the estimation devices and records the information generated at that time. A genetic algorithm is used to create a path planner that takes accuracy and speed into account. Summary The proposed model performed formidably with a loss of 0.0132 and a validation loss of 0.0144. The proposed model's results give a pathway for supervised learning in autonomous self-drive cars. There are other models that outperform the proposed model, like using reinforcement learning. Still, the proposed model shines a light on supervised learning, showing that it can also perform competently in these scenarios. References [1] Mariusz Bojarski, Ben Firner, Beat Flepp, Larry Jackel, Urs Muller, Karol Zieba and Davide Del Testa.2016.”End-to-end deep learning for self-driving cars”. [2] Nakul Audeechya, Mahesh Kumar Porwal.2014. “LIDAR Technology and Applications”. [3] Alexis C. Madrigal.2012. “Driverless cars would reshape automobiles *and* the transit system”. [4] Nazneen N Sultana, Vinita Baniwal1, Ansuma Basumatary , Piyush Mittal , Supratim Ghosh1, Harshad Khadilkar1.2021.”Fast approximate solutions using reinforcement learning for dynamic capacitated vehicle routing with time windows”. [5] Qudsia Memon; Muzamil Ahmed; Shahzeb Ali; Azam Rafique Memon; Wajiha Shah.2016.”Self-driving and driver relaxing vehicle”.in Proc. IEEE Conf. [6] Jacob Newman; Zheng Sun; Dah-Jye Lee.2020.”Self-Driving Cars: A Platform for Learning and Research”. in Proc. IEEE Conf. [7] Bhaskar Barua, Clarence Gomes, Shubham Baghe, Jignesh Sisodia.2016.”A Self-Driving Car Implementation using Computer Vision for Detection and Navigation”. in Proc. IEEE Conf. [8] Harivansh Prasad Sharma, Reshu Agarwal, Manisha Pant; Shylaja VinayKumar Karatangi.2021. “Self-directed Robot for Car Driving using Genetic Algorithm”. in Proc. IEEE Conf.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 250-258 doi:10.4028/p-1ol72g © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-16 Accepted: 2022-09-19 Online: 2023-02-27
Deep Learning Approach to New Age Cinematic Video Editing Disha Duttaa, Shrunkhala Jawadeb, Niveditha S.c, Syed Mateen Hussaind Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani, Chennai, India [email protected], [email protected], [email protected], [email protected]
a
Keywords: video editing, sky background, background replacement, weather change, deep learning, background enhancer, cinematic sky lightning adjustment
Abstract: This paper gives a new technique for video editing by introducing modules that are trained to create realistic and dramatic sky backgrounds in videos. This project is different from what is being used in the video editing ecosystem as it does not require any static photos or inertial measurements. The module can be simply used on any device without having any prerequisites in the device. This is a game-changer when it comes to capturing cinematic sky videos. This project is further branched into three different modules to segregate the different tasks including sky routing, flow reckoner, and image emulsifier. These methods will run in real-time and are user friendly. This project can generate high fidelity videos with different lighting and dramatics in outdoor environments. Adding further we can also easily synthesize different weather conditions. This editing technique is much simpler and easier giving a more aesthetic image for cinematic shots. I.
Introduction
The sky has always been a major component for outdoor photography due to the various lighting issues that present itself. Movie directors and film crew face unexpected weather conditions while capturing outdoor footage. Most of the footage that is captured has an overexposed background or white looking sky. In an alternate situation many influencers/vloggers have started to capture videos of their day to day life, as short videos are the trend on major social media platforms. With the help of our video editing technique, vloggers can improve their video content by getting the perfect sunrise/sunset footage in their videos, not only that but even film crew do not need to worry about overexposed lighting or weather conditions. Currently the editing department for film making, edit the background of their shots frame by frame, trying to matte the background with the sky region.With the help of our video editing technique the workload for video editors will reduce drastically improving overall efficiency in the filmmaking process. On the other hand there have been other apps that have focused on sky augmentation but they come with limitations as they have specific specifications on what camera can be used and are not very user friendly. In our proposed project, there are no limitations on what camera can be used to capture footage and is suitable for both online and offline video editing. Our project has 3 modules in its working ● Sky routing: In this module the sky region is detected in the video footage. It is different from existing techniques as it performs a binary pixel wise classification for frames. ● Flow Reckoner: In this module we estimate the motion of the objects that are present in the background. ● Image Emulsifier: The image emulsifier module will bend or twist the sky background based on the image motion and at the end blend it with the foreground. We analyze our model on both low and high pixel video footage and for both the cases it can produce satisfied videos having different kinds of lightning and weather conditions to be particular. Our main objective behind building this model is that we are proposing a new framework for sky augmentation in outer videos as previous works only focus on sky edit on static images and not on videos. Also, previous works only focus on the whole background of an input image, but our model will focus on only sky background from a video and change it weather or lightning or any kinds of objects in the sky background of the video.
Advances in Science and Technology Vol. 124
251
We have organized the rest of the paper into nine different sections, namely : introduction, related work, methodology, model configuration, implementation details, quantitative study, conclusion, limitation and future scope and references. II.
Related Work
Many studies and models have been built for editing static images or for only understanding the flow of objects present in a video. Yanting Pei and his cluster mainly focused on studying the problems faced by existing models during synthesize of more complex and different forms of noisy images using CNN [1] and built a model for the same and implemented it on Caffe using ResNet, AlexNet, VGGNet.Multiple image reflection removal module[6] was made by Daniel P.K Lun and his fellow members to solve reflection problems in images given. They estimated depth using estimator network of edge depth which used CNN and GAN with WGAN to build their model. For evaluation they compared their model with the different state art methods both quantitatively as well as qualitatively hence, their algorithm outperformed better than pre existing algorithms.Jianming Zhang and his group proposed an end-to-end multi-stream fusion network[8] to generate a new image when given a pair of background and foreground. For MLF they have used the Laplacian or the images to blend the different levels along the boundaries masked. They have used DUTS, and MSRA-10k datasets for training their model, SynTest is thus used for quantitative tests in the project and PSNR to measure the final compositing quality. For Single Tone Background Removal from the backgrounds [7] an Indian group used the technique of Chroma Keying usually used for motion pictures and video gaming. Dynamically using a new dual-threshold selection scheme for hue values for different points in captured images. They practiced the given image using MATLAB software and got a black processed image for output confirming that it is possible to remove the illuminated background using this algorithm. Ming Fang [13] proposed a system to get a better denoising effect in videos. The proposed system had a combination of both present and spatial information. This combination of two adjacent frames is used for forming a new reference frame. Then a bilateral filter algorithm is applied on the reference frame to denoise the video frame, this proposed system increasing the video quality of low light video frames. Amir Kolaman, Dan Malowany [14] proposed a system for Light Variations using CNN. The proposed system is a convolutional neural network based object detection algorithm that is done using (LIVI) which is (Light Invariant Video Imaging) . (LIVI) works by reducing the influence of radiant lighting conditions and processes the object’s image without any of the lighting conditions. Double channel object tracking model was built by Jun Chu along with his group that detects objects from the background [3]. They calculated MDNet’s position deviation using a discriminative correlation filter and trained their model on OTB100 dataset and divided them into two groups for resolution and long term target and evaluated their result based on Euclidean distance between test and real videos. Wenguan Wang and his colleagues did a study on the human perspective of eye tracking in the 3D objects [4] of a video pattern using unsupervised learning. They built their model for study on an in-depth algorithm for alignment calculation between the video objects and human eye perspective using SGD optimizer using Caffe’s modified version and received a glimpse in their result. Tao Zhuo and his fellow members aimed at building a model segmentation for objects present [5] in the video. They observe the region and optical flow between objects using CRF model and forward propagation. And finally built their model using MS-COCO dataset on pretrained Mask R-CNN. Further they evaluated their model against several state-of-the-art models and it all outperformed by large margin. Students from Computer Science and Engineering, Qatar University have proposed a motion detection and a video summarizing technique [9] to summarize the number of moving objects in the given video and to summarize just the period of time for the objects. They have used the subtraction method, block-based morphological operations and the Otsu method for the same. From the gaussian histogram formed it can be concluded that this method was robust, effective and detected accurately.
252
IoT, Cloud and Data Science
Xinguang Xiang and his group built a model to produce blur free videos from super hazy or blurry videos [2] and used sharp exemplars on sharpness features for building. They calculated optical flow using batch normalization without pre-trained FlowNet to find each frame's neighbor’s information. Finally they compared their model with state-of-the-art video models on parameters SRN, DVD, OVD, EDVR, STFAN to check accuracy and it outperformed all. A group of scholars from Zhejiang Normal University Jinhua, China have proposed a method called target detection using gaussian mixture background and cuckoo search [10]. Cuckoo algorithm is derived from the mechanism of cuckoo’s nest and the principle of levy flight. They have used paper grayscale and fixed frame further to reduce background. They have proposed that this method improves the detection effect of moving targets and will also enhance the accuracy of background detection. Minda Zhao and her colleagues proposed a system for video stabilization[11]. The proposed video frame stabilization network is called PWStableNet. PWStableNet works a pixel-wise warping method in video frames. This method uses a multi-formation, encoder-decoder structure. This pixel wise wrapping maps between 2 unstable frames. ZHONGQIANG WANG [12] proposed a system for smoothing video frames. The proposed system works by a video splitting algorithm that is used in pre-processing videos and binomial filtering which helps in smoothing video frames. Hua Huang proposed a system [15] for video stabilization in shaky videos. The proposed system reuses stabilized motion frames of the input video feature points. These motion video frame vectors are optimized based on the residuals between the stabilization predictor and the ability to improve the speed of motion searching. III. Methodology Our model has three modules, sky routing module, flow reckoner module and image emulsifier module. A. Sky Routing The sky routing technique is the first component for building our model. The objective of this component is to detect the background sky region in any video footage. We have done this by differentiating objects from the background and have outlined only the sky region from each video frame. We took advantage of the pixel wise regression framework for predicting sky regions from the videos that contain both fine and coarse pixel frameworks. The sky routing technique is further divided into three segments which consists of an Encoder E, Decoder D and a module that makes small changes to each video frame. We will be using the Coordinate Convolution Layer for detecting the sky region from the video footage. The role of E is to find feature representations in a low resolution down-sampled video frame. The main objective of D is to outline a coarse sky region, and the module will produce a refined sky region by taking input of both the down-sampled video frame and the initial high resolution video frame. In the sky routing module we use a ResNet architecture for the (E) and build our (D) as an upsampling network that consists of multiple convolution layers.
Fig. 1. Flow chart of our model. Our model consists of three modules, that is, sky routing, flow reckoner and image emulsifier.
Advances in Science and Technology Vol. 124
253
As the sky in any footage is present on the top side of the frame, the sky routing module uses coordinate convolution layer for the Encoder and Decoder, this is where y coordinates are placed in the video frame of the sky region. This helps in a more detailed mapping and increases the accuracy of outlining the sky region even toward the edges of the objects. OF(Il) = E Il ∈Yl { 1/Nl(∥F(Il) - 𝑅𝑅l∥2) } where, I is the high resolution video which has been used as input and Il is the down sampled, low resolution of the same input video. F is the rough segmentation network which is E and D. It takes in the down-sampled Il and produces a sky background similar to the high resolution of I. Rl = f(Il) and 𝑅𝑅l is the output of f. We have trained f to minimize the distance between Rl and 𝑅𝑅l in their pixel space. ∥ · ∥2 shows the pixel l2 distance between two images. Yl consists of the dataset which is the lower resolution images on which the model f is trained. Nl is the total number of pixels that are present in the output video frame. After obtaining an outline of the sky region in a down-sampled video frame resolution, the footage is again brought back to its original high resolution in the refinement stage. This technique helps us in outlining the sky region in detail, by helping us recover detailed structures of the sky region. This makes the method very effective and efficient. By using the high resolution footage as a reference, we can also remove green and red channels for an improved contrast in the video frame. M = g (h(Rl), I, r, ε) where, M denotes a high resolution sky region after the outlining/refinement process. R and e are radius, h is the up sampling of the video footage and g is the guided filtering. B. Flow Reckoner In this module we estimate the motion of the objects that are present in the background. As in a 3D video the objects are considered to be located at infinity but for our human perspectives we assume that the objects are located within the video box at their perspective transform parameters in fixed value. To build our model we assume that the patterns in the sky are modeled by an Affine matrix, T ∈ R3 X 3. Then we compute optical flow between the objects that are present in each frame of the video iteratively using Lucas-Kanad method. Thus, we track the feature points frame by frame and get a collection of sparse feature points per frame in order. Then we compute 2D optimal transformations limiting our measure on uniform scaling, rotation and translation with four degrees of freedom for each adjacent video frames using RANSAC based robust Affine estimation algorithm with the help of given two collection for 2D feature points.Since we are considering only the sky region of the video, in some cases when the feature points detection for certain video frames is very less, we perform depth estimation on those frames to proceed further. According to our study we analyzed that for rare cases it is difficult for RANSAC reprojection error to detect motion estimation stably. So, we assume that the motion of the background track between two frames is influenced by rotation and translation. Then between two adjacent frame’s paired points we find the Euclidean distance and apply Gaussian kernel density estimation [16] of bandwidth 0.5 on them. We find the probability and remove the match points satisfying below condition P(d) < η , η = 0.1 where, P denotes probablity Thus, after we got each adjacent frame with their movements, then we can easily compute Affine Matrix Tij for a video from initial to its jth frame using the following formula Tij = Ti · (Tj · Tj-1 . . .T1) where Ta represents the parameters of the motion estimated between frame a-1 to a amd a=1,...,j Thus, the video background can be deformed into a final video box at any frame j using the Affine matrix Tj.
254
IoT, Cloud and Data Science
C. Image Emulsifier The image emulsifier module will bend or twist the sky background based on the image motion and at the end blend it with the foreground. This module will also relight and recolor the overall video to give dramatic effects.V (t) , Sp (t) and St (t) can be the video frame, predicted sky routing and in order sky template image at a certain time t. To find newly composed frame F (t) as the linear combination of all three with pixel wise combination F (t) = (1 − Sp (t) )V (t) + Sp (t) St (t) where, V (t) , Sp (t) and St (t) are the video frames V (t) and St (t) may have different color tone so directly performing the combination equations may result in imbalance of the next required outputs. Now to transfer the colors and intensity from background to foreground we use relighting and recoloring techniques. For each frame available following equations will be applied: b(t) ←− V (t) + α(µ (t) St( Sp=1 ) − µ (t) V( Sp=0 )) (t) V (t) ←− β(V^(t) + µ (t) V− µ^ V ) where, µ (t) V and µ^ (t) are the mean color values of the image frames. For (Sp=1) and (Sp=2) are same mean color pixel values at pixel locations found in sky regions and non-sky regions. Α and β are predefined and relighting factors. Next, it will then correct the pixel intensity range of the foreground so that it can be compatible with the background.The parameters used while detecting the pixel for this module are batch normalization, bilinear up sampling, and max pooling. IV.
Model Configuration
For building our sky routing stage we have constructed five layers of coordinate convolutional layers with Batch Normalization, Pool and ReLu layers for encoding. And for decoding we have constructed five layers of coordinate convolutional layers with Bilinear Upsampling, Rel=Lu layer and encoder layer. And for predicting layers pixel wise we have used Coordinate convolution layer and sigmoid activation function. For our encoder and decoder we have used the UNet model configuration that basically has two parts, that is contraction and symmetric expanding path. We have set the same spatial space between the encoder and decoder with added skip connections.
Advances in Science and Technology Vol. 124
255
Table 1. Detailed Configuration of Layers where En and Dc denotes encoder and decoder layer respectively.
V.
Layer
Configuration
En_1
Coordinate Convolution - Batch Normalization - Max Pooling
En_2
3 * ResBlock
En_3
4 * ResBlock
En_4
6 * ResBlock
En_5
3 * ResBlock
Dc_1
Coordinate Convolution - Bilinear Upsampling + En_5
Dc_2
Coordinate Convolution - Bilinear Upsampling + En_4
Dc_3
Coordinate Convolution - Bilinear Upsampling + En_3
Dc_4
Coordinate Convolution - Bilinear Upsampling + En_2
Dc_5
Coordinate Convolution - Bilinear Upsampling
Dc_6
Coordinate Convolution
Implementation Details
A. Dataset We have used the ADE20K dataset which is for semantic segmentation that contains over 20K images of labeled object parts and pixel level objects containing discrete objects including sky. This dataset includes many subsets where each one of them use different methods for creating their own sky region. They have two segments in this dataset, indoor and outdoor. We have used ADE20K outdoor datasets. This dataset has an image distributed segment mask dividing the image into 160 different classes pixel-by-pixel. We have used the sky object from this dataset. For our model training and evaluation we will be using ADE20K+DE+GF subset which uses density estimation inpainting and guided filters. B. Training We have set guided filter’s regularization coefficient as 0.01 during the sky routing refinement step. In flow reckoner step we used Gaussian for density estimation with a bandwidth of 0.5.We have trained our model using Adam Optimizer and have set batch size as 8 and learning rate as 0.001 and we run 200 epochs. For every 50 epochs we reduce the learning rate by 1/10. For training images we have set its size as 384 X 384 including random-crop with scale as (0.5, 1.0), randombrightness factor as (0.5, 1.5), random-gamma γ as (0.5, 1.5), random-saturation factor as 0.5, 1.5), random-crop with scale as (0.5, 1.0), horizontal Flip and ratio as (0.9, 1.1). VI. Quantitative Study We have tested our model under various parameters, comparing our model with pre-existing models and by using systems having different specifications. A. Speed Performance
256
IoT, Cloud and Data Science
The speed of the editing technique had been tested using a laptop of Intel I7-9700k CPU, Nvidia Titan XP graphic card and Google Collab GPU. The speed has been tested for outputs of different resolutions and the time taken to render the video footage after applying the effects. The results showed that our method was able to render at a processing speed of 24 fps for a video resolution of 640 x 320. For higher resolution footage such as 854 x 480 was able to render it at 15 fps. Our speed test showed that the majority of the time consumption took place at the sky routing module which is the sky outlining task. Table 2. Speed performance at different resolutions Output resolution in pixel (pxl)
Speed in Frames Per Second (FPS)
640 X 360 854 x 480 1280 x 720
24 15 8
Rendering time in Rendering time in Renderi ng time in sky routing step flow reckoner step image emulsifier step 0.0234 0.0334 0.0556
0.007 0.014 0.03
0.01 0.018 0.03
B. Experimental Analysis We compare our model with the CycleGAN model that typically aims at solving problems on image to image translation on BDD100K dataset that contains videos having different weather conditions and our model outperforms better than CycleGAN. We tested our model on sunny2cloudy and cloudy2sunny scenarios on Pixel Intensity (PI) and Natural Image Quality Evaluator (NIQE) parameters and observed that our model is giving better results than the pre-existing ones. Table 3. Model Comparison with existing CycleGAN model Framewor k
Evaluation Model
cloudy2sun ny
Our Model CycleGAN Our Model CycleGAN
sunny2clou dy
Natural Image Quality Evaluator 7.444 7.852 6.534 7.101
Pixel Intensity 5.512 6.974 6.243 7.634
We have also tested our model by using Conventional Convolutional Layer in place of Coordinate Convolution Layer while detecting the sky regions from the video frames and noticed pixel accuracy drop both before and after refinement following parameters Peak Signal-to-Noise Ratio also referred to as PSNR in short and Structural Similarity also referred to as SSIM in short. Higher the accuracy rate, the more it is better. Thus, it concludes that positional encoding is important while detecting sky regions. Table 4. Model accuracy with or without sky refinement Convolution Layer Architecture Conventional Convolution Layer Coordinate Convolution Layer
Mean Pixel Accuracy (PSNR/SSIM) Without Refinement With Refinement 26.17 / 0.928 27.48 / 0.972
27.20 / 0.997 27.92 / 0.924
VII. Conclusion People usually look at video editing software’s to be particularly hard to study and work with, thus normal people don’t usually try video editing or sky replacement. We have resolved this problem by introducing a model which is specially trained to create dramatic effects in sky backgrounds. To do
Advances in Science and Technology Vol. 124
257
so we have bifurcated the module into sky routing, flow reckoner and image emulsifier for getting a high-resolution output. Further using the ADE20K dataset we have trained our model such that it doesn’t require any pre-requisites or hard-core learning to work with our model. This project requires just a click from the user to get better resolution and required animations in the videos. For offshoot the product can also add different weather conditions to give a dramatic effect to the output. Hopefully, this module can be an advantage for new age deep learning models for cinematic video editing. VIII. Limitation and Future Scope At a certain point in the video frame when there is no sky region in the frame it is so minimal, then in that situation our model stops estimating the motion of the sky background as the feature points of the objects present in the sky background produces inevitable errors. Our model also performs adverse when the sky background is too dark in the video. Our next step is to focus on adding time intervals so as to add different kinds of scenic background across different intervals of a particular video. We would also dive into making our model deployed as an app that will allow users to add video directly and not as a video path. References [1] Yanting Pei, Yaping Huang, Qi Zou, Xingyuan Zhang, Song Wang, "Effects of Image Degradation and Degradation Removal to CNN-based Image Classification", Volume: 43, Issue: 4, November 2019 [2] Xinguang Xiang , Hao Wei, and Jinshan Pan, Member, IEEE, "Deep Video Deblurring Using Sharpness Features From Exemplars", Volume: 29, September 2020 [3] Jun Chu, Xuji Tu, Lu Leng, Jun Miao, "Double-Channel Object Tracking With Position Deviation Suppression", Volume: 8, December 2019 [4] Wenguan Wang, Jianbing Shen, Xiankai Lu, Steven C. H. Hoi, Haibin Ling, Attention to Video Object Pattern Understanding", Volume: 43, Issue: 7, January 2020
"Paying
[5] Tao Zhuo, Zhiyong Cheng, Peng Zhang, Yongkang Wong and Mohan Kankanhalli, "Unsupervised Online Video Object Segmentation with Motion Property Understanding", Volume: 29, July 2019 [6] Tingtian Li, Y.H. Chan and Daniel P.K. Lun, 2021, “Improved Multiple-Image Based Reflection Removal Algorithm Using Deep Neural Networks”, Volume 30, October 2020 [7] S Ghosh, S P Pathak, K Kabasi, T K Moharana, M Bagchi, N Dasgupta, S Chaudhari and S Kumar, “Single Tone Background Removal Applying Dual Threshold and Dynamic Clustering”, April 2019 [8] He Zhang, Jianming Zhang, Federico Perazzi, Zhe Lin, Vishal M. Patel, “Deep Image Compositing”, June 2021 [9] Omar Elharrouss, Noor Al-Maadeed, Somaya Al-Maadeed, “Video Summarization based on Motion Detection for Surveillance Systems”, July 2019 [10] Xiao Wen, Haoran Zhang,“Video Target Detection Algorithm Based on Cuckoo Search and Gaussian Background Mixture Model”, May 2020 [11] Zhao, Minda, Ling, Qiang, “PWStableNet: Learning Pixel-Wise Warping Maps for Video Stabilization”, Volume 29 January 2020 [12] Wang, Zhongqiang, Zhang, Lei, Huang, Hua, “High Quality Real-time Video Stabilization Using Trajectory Smoothing and Mesh-based Warping”, Volume 6, April 2018
258
IoT, Cloud and Data Science
[13] Fang, Ming, Wang, Yichen,Li, Hongna. Xu, Jing, “Detail Maintained Low-Light Video Image Enhancement Algorithm”, October 2018 [14] Kolaman, Amir, Malowany, Dan, Hagege, Rami R., Guterman, Hugo, “Light Invariant Video Imaging for Improved Performance of Convolution Neural Networks”, Volume 29, Issue 6, June 2019 [15] Huang, Hua, Wei, Xiao-Xiang, Zhang, Lei, “Encoding Shaky Videos By Integrating Efficient Video Stabilization”, Volume 29, Issue 5, May 2019 [16] https://en.wikipedia.org/wiki/Kernel_density_estimation [17] https://groups.csail.mit.edu/vision/datasets/ADE20K
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 259-267 doi:10.4028/p-r8573m © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-19 Accepted: 2022-09-23 Online: 2023-02-27
YouTube Music Recommendation System Based on Face Expression Kanchan Y. Rathod1,a and Tanuja Pattanshetti2,b Computer Engineering College of Engineering Pune, Pune, India
1,2
[email protected], [email protected]
a
Keywords: face-recognition, convolutional neural network, streamlit-webrtc, recommendation system
Abstract. Nowadays face recognition system is widely used in every field of computer vision applications such as Face lock-in smartphones, surveillance, smart attendance system, and driverless car technology. Because of this, the demand for face recognition systems is increasing day by day in the research field. The aim of this project is to develop a system that will recommend music based on facial expressions. The face-recognition system consists of object detection and identifying facial features from input images, and the face recognition system can be made more accurate with the use of convolutional neural networks. Layers of convolution neural network are used for the expression detection and are optimized with Adam to reduce overall loss and improve accuracy. YouTube song playlist recommendation is an application of a face recognition system based on a neural network. We use streamlit-webrtc to design the web frame for the song recommendation system. For face detection, we used the Kaggle-FER2013 dataset, and images in the dataset are classified into seven natural emotions of a person. The system captures the emotional state of a person in real-time and generates a playlist of youtube songs based on that emotion. I. Introduction Automated face detection systems can mimic human coding skills. Thus face detection is an important factor for the extraction and analysis of both images and video data and providing natural and impartial emotional feedback. the facial emotion recognition system is computerized, it uses an algorithm to identify faces, recognize facial expression and deliver the emotion accordingly. In face analysis, the first step is to locate the face within an image, the second is to detect landmarks from detected face and the third is to classify facial expression and emotion.
Fig. 1. Overview emotion base recommendation system Emotion detection is our primary objective and based on emotion users will get a song playlist that depends on their current mood .A recommender system uses unsupervised learning algorithm. We apply different layers of Convolutional Neural Networks to detect emotions.
260
IoT, Cloud and Data Science
This paper is divided into three parts: • Face detection using mediapipe holistic • A CNN architecture for detecting facial emotions. • Recommendation of youtube song playlist II. Realted Work Zhang 2020 (1) purposed a model for movies and song suggestion using the method of ResNet38 for emotion detection with an accuracy of 64.02% and testing using Kaggle FER2013 datasets on five emotion. Support vector machine and four-layer of CNN yield the output accuracy on training and testing data as 40.7% and 62.5% respectively. Maliha khan 2019 (2) real-time face detection using OpenCV and proposes a method PCA for face detection system on data of fingerprint image collection with the use of haar cascade classifier, eigenface, and python library. Shlok glida 2017 (3) introduces the intelligent music recommendation system for user extract mood. The emotion song player uses the real-time emotion of a person and the module takes emotion as input, using the DL algorithm to specify the mood of the user with an accuracy of 90.23% on four classes of emotion. H.Immanuel James (4) 2019 proposed a recommendation system for smart music players depending on facial emotion. Face expression has followed the steps of face detection, feature extraction, and expression recognition using the SVM classifier on sad, happy, and angry emotions. Pushkar Mahajan 2020 (5) Survey paper for different mood detection techniques and recommendations of music based on user mood. The paper mainly focuses on moods like fear, happiness, joy, and aggressiveness using the OpenCV computer vision library. Shavak Chauhan (6) 2021 proposed a movie recommendation implemented CNN with the use of a decision tree and boosting algorithm to gain maximum accuracy and the recommendation system uses both collaborative and content based filtering for IMDB rating calculation. Hiten Goyal 2022 (7) real-time face detection with a mask using CNN architecture, the images in the dataset are divided into two categories as with mask and without mask and model computation using DenseNet-121, MobileNet-v2, and VGG-19. The application of face with mask detection in various fields in pandemic of covid-19. Ninad Mehendale 2020 (8) designed a model for emotion detection using FERC and FERC divided into two stages: stage to clear background from image and stage II for feature extraction. the emotion vector was tested on five moods of a person using Cohn-Kanade expression and NIST datasets. Monika Singh 2020 (9) images are classified using a facial recognition system and for automatic face emotion detection gobor feature is used on training dataset images. The principal component analysis is used on seven moods of a user with an accuracy of 78.05%. Zeynab Rzayeva 2019 (10) A model for face detection using CNN on training images using cohn-kanade and ravadess datasets. The model detects a small and micro-expression of a face with the use of a neural network.
Advances in Science and Technology Vol. 124
261
III. System Architecture for Proposed Model
Fig. 2. System architecture IV. Process for Face Detection and Emotion Recognition A. Face detection using mediapipe holistic The colour image is consisting of three components i.e. RGB channels .In data pre-processing the colour image is converted into grey for better computation. Now after converting the image from RGB to grey, we will try to locate the exact features of face. The detected face contains x coordinate, y coordinate, width. The python library media pipe holistic pipeline is used for the identification of the pose, ace, and hands components .Media pipe holistic is fast and accurate in a real time face detection technique. Mediapipe provides a solution for face detection through face mash. The output image of face mash contains 468 annotated landmarks and each landmark has its own key value. A mesh of emotion faces is defined with vertices taken from the key landmarks. An emotion face mesh is drawn to form a closed mesh structure that connects vertices to one another is shown in fig4.face mash of different emotions. We use facial expression recognition (KaggleFER) dataset. The dataset is imbalance and it contain total no of training image 28709 and testing image 3589. There are seven classes of images present in the Facial Expression Recognizer dataset and they are labelled as happy, sad, angry, surprise, disgust, fear, and natural.
Fig. 3. Face mash with 468 landmarks (left) and key value of landmark
262
IoT, Cloud and Data Science
Table I. Selected Key Landmarks (Vertices) And The Corresponding Mediapipe Landmarks
B. Equations for Mesh Angular Encoding
The angle θ between the line of edges connecting P2 and P3 and the line connecting the points P2 and P1 is unknown. angle β between the line P2-P3 and the X-axis can be computed
Similarly, the angle α between the line P2-P1 and the X-axis can be computed as
Hence, the angle θ will be
Advances in Science and Technology Vol. 124
263
Fig. 4. Face mash for different emotions C. Expression detection Convolution neural network is used for expression detection. In convolution neural network there are mainly three layers that is input layer, hidden layer, and output layer. The hidden layer of CNN contains convolution layers, Relu layers, and Pooling layers. The below fig2. Is a schematic representation of layers of convolution neural network. Convolutional neural network contains n no. of hidden layers. First, image from the input layer is sent directly to the first hidden layer, where the weights of the first hidden layer are multiplied. The hidden is a combination of Convolution layers, pooling layers and fully connected layers. Convolution layers(Conv2D) used for extraction of feature from image. This layer uses filters and when this filter passes through image datasets it is capable of extracting a specific set of characteristics. We have N no. of filters in the convolution layer and the filter size is 3*3 each filter extracts a specific feature of the image that is one filter for eyes, nose, and whole face.
Fig. 5. Schematic representation of convolutional neural network This process continues throughout the image processing. Rectified unit layers (ReLU) is an activation function layer. Once the feature is maps are extracted the next step is to move a relu layer. Relu performs an element-wise operation and it removes the negative image. CNN Model summary of the facial emotion recognition using keras deep learning library is shown in fig6.
264
IoT, Cloud and Data Science
Fig. 6. Summary of deep learning model using keras library Each convolution layer uses a dropout to minimize over fitting on training data. The input layer is resized to 48 x 48 pixel images. Pooling layers used to reduce the dimension of the image. The dense layers represent neural network input, the flatten layer converts the two-dimensional arrays into a single-dimensional vector. In this case, we have two layers one as input and other as output. Since there are 7 classes to be classified, the output layer has 7 units. This model is constructed using four CNN layer and two fully connected layers. Relu uses softmax for the classification of images. The softmax gives results in the form of probabilistic output. Following parameter used while training a model is given below. Table II. Parameter Used in Training Model Parameter
Details
Learning rate Epoch Batch size Optimizer Loss function
0.001 25 128 Adam Categorical_crossentropy
D. Recommendation system for Youtube videos For real time object detection we used streamlit –webrtc. Streamlit –webrtc need a key argument as primary identifier and run with streamlit local host .We create button as input to user for recommendation of music and the button contain Singer name and Language of singer .In streamlit if button is press then automatically refresh the frame and again user captured the emotion .After giving input webcam captured emotion of user and detect the present mood .If the user is not in frame it gives warning to user that “Let me capture first”. There is another button if webcam successfully detecting face i.e. “Recommend me song” and pressing that button user directly is in Youtube URL and access all the recommended song base on mood. The execution of these is shown in fig4.
Advances in Science and Technology Vol. 124
265
Fig. 7. Front end interface
Fig. 8. Corresponding output V. Result A. Training Process In a graph of accuracy and loss, it can be seen that the total accuracy of training and testing is directly proportional to the epoch meaning when the epoch rises, the accuracy of training and testing also rises. For both train and test images, the accuracy rate rises from epoch 0 to 25. Total loss of training and testing depends on the epoch that is when the epoch drop, the loss of training and testing also drops. According to the plots, the model is not over fitted.
Fig. 9. (a) Accuracy of training and testing data and (b) Model loss during the training process of CNN of the model
266
IoT, Cloud and Data Science
B. Confusion matrix In the following table of confusion matrix, seven classes are classified with regard to their accuracy. The confusion matrix is summary of the accuracy and prediction .In the following table it is conclude that happy, surprise, and neutral emotion is relatively good compared to others.
Fig. 10. Confusion matrix VI. Conclusion In this project, we design the emotion base recommendation system based on user emotion. This paper is based on an image recognition method using four layers of convolution network. Our model is capable to recognize seven classes of facial emotion. CNN is able to detect the face emotion in real time and recommendation system play a song using streamlit web framework.. In this model, Adam optimization is used to reduce loss function and it has an accuracy of 67.04%. OpenCV is used to extract facial expressions from a webcam and recommend songs. After giving permission camera will access their facial image and the system can get a song playlist depending on their mood. This will result in a good recommendation of a song based on user current mood. References [1]. Movies and pop songs recommendation system by emotion. Zhang, J. 2020. Journal of Physics: Conference. [2]. Face detection and recognition using opencv. M. Khan, R. Astya, and S. Khepra, and S. Chakraborty. 2019. International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). pp. pp. 116– 119. [3]. Smart music player integrating facial emotion recognition and music mood recommendation. S. Gilda, H. Zafar, C. Soni, and K. Waghurdekar. 2017 : international conference on wireless communications, signal processing and networking (wispnet). [4]. H. I. James, , J. M. M. Ruban, M. Tamilarasan, and R. Saranya. Emotion based music recommendation system. J. J. A. Arnold. Emotion. s.l. : vol. 6 no. 03, 2019. [5]. Face player: Facial emotion based music player. P. Mahajan, P. Khurad, P. Chaudhari, et al. s.l. : International Journal of Research and Analytical Reviews (IJRAR),, 2020, Vol. vol. 7.
Advances in Science and Technology Vol. 124
267
[6]. Analysis of intelligent movie recommender system from facial expression. S. Chauhan, R. Mangrola, and D. Viji. s.l. : 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 1454– 1461, IEEE, 2021. [7]. Face detection and recognition system using digital image processing. Goel, G. Singh and A. K. s.l. : 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 348– 352, 2020. [8]. Facial emotion recognition using convolutional neural networks (ferc). Mehendale, N. s.l. : SN Applied Sciences, vol. 2, no. 3, pp. 1–8, 2020. [9]. Emotion based music recommendation system. James, H Immanuel and Arnold, J James Anto and Ruban, J Maria Masilla and Tamilarasan, M and Saranya, R. s.l. : emotion, 2019, Vol. 6. [10]. Facial emotion recognition using convolutional neural networks. Alasgarov, Z. Rzayeva and E. s.l. : IEEE 13th international conference on application of information and communication technologies (AICT), pp. 1–5, IEEE, 2019.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 268-276 doi:10.4028/p-43g8x2 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-23 Accepted: 2022-09-23 Online: 2023-02-27
Driver Yawn Prediction Using Convolutional Neural Network Melvin Jharge S.R.1,a, Rokesh B.2,b, Dheepajyothieswar S.3,c, Dr Akila K.4,d Student, Department of Computer Science and Engineering, SRM University Chennai India
1
Student, Department of Computer Science and Engineering, SRM University Chennai India
2
Assistant Professor Department of Computer Science and Engineering, SRM University Chennai India
3
Assistant Professor, SRM Institute of Science and Technology, Chennai, India
4
[email protected], [email protected], [email protected], d [email protected] d
a
Keywords: Driver fatigue, yawning detection, keyframe, Selection, subtle facial action,3D deep learning network.
Abstract. Driver weariness is the leading cause of road accidents, according to numerous studies. Computer vision algorithms have shown promise in detecting indicators of exhaustion from facial motions like yawning. Precision and consistent yawning recognition is difficult in the real-world driving environment due to the various facial gestures and expression of driver. Yawning causes a mouth deformation in a variety of facial activities and expressions. This paper provides a novel way to based on minor facial motion identification to address the aforementioned concerns. We offer a for nuanced face activity recognition. Bidirectional and 3D convolutional networks are used in this network. A keyframe selection technique is used to discover the most from delicate face gestures. This method employs photo histograms to swiftly eliminate redundant frames and the median absolute deviation to locate outliers. A variety of tests are also run on the method. 1. Introduction INTERACTIVE DRIVING, which involves sending out early warning signals, monitoring, and assisting with vehicle control [1], has become a popular research topic in recent years as a means of improving road safety. Driver fatigue is a serious threat to road safety. Drowsy driving incidents were reported by 10% of those who admitted to them in the previous month or year. According to experts, driver fatigue was found to be the cause of 22% of traffic accidents. Drowsy driving is six times more likely than normal driving to cause an unnoticed collision or near-crash. As a result, more study into methods for detecting driver fatigue is required. The importance of improving road safety cannot be overstated. Many methods for detecting driver fatigue have been proposed throughout the years. promote traffic safety by making it easier for drivers to drive safely Some of the characteristics of weary drivers are listed below.Blinking, nodding, closing the eyes, and yawning are all examples of nonverbal communication. Among these behaviours, the most common types of fatigue manifestation [7]. A lot of research has been done on yawning detection. In comparison to physiological motions, facial actions like yawning may be considered light [8]. Detecting yawning correctly and reliably in traffic is difficult due to the wide range of facial motions and expressions used by drivers. the situation in which you'll be driving in the real world Researchers have developed a range of approaches to detect yawning. Existing yawning detection systems rely on a single static image and are focused on quantifying the mouth opening. They fixed their gaze on the mouth. The researchers identified yawning by comparing the characteristics of open and closed jaws. Deep-network-based models. The period between yawning and significant fatigue driving when using a yawning indicator. Accidents, including fatalities, can occur at any time. As a result, the effectiveness of the yawning detection time is critical. Time-saving methods are applied in all steps of the strategy that has been proposed.
Advances in Science and Technology Vol. 124
269
The study's main contributions are listed below. The proposed approach has the potential to dramatically eliminate unnecessary frames and improve subtle facial activity recognition. 1.1 Motivation Despite advances in the road and vehicle design that improve driver safety, the total number of serious crashes continues to rise. Driver distraction is becoming more prevalent as vehicle technologies such as navigation systems, cell phones, and the internet advance. Drowsiness and fatigue impair drivers' abilities long before they realize they are tired. 2. Literature Study A literature review is a piece of writing that will summarise the most important components of existing knowledge or methodological approaches to certain problem. It's a secondary source that talks about published content in a given subject area, as well as information from a specific historical period. It may just be a list of references that appears before a research proposal. It usually follows a pattern and includes both synthesis and summarization. 2.1 Existing System Various studies have revealed that driver tiredness is the leading cause of traffic accidents. The use of computer vision algorithms to discern signs of weariness from facial actions such as yawning has showed considerable promise. In any case, precise and powerful yawning detection is difficult due to the jumbled facial actions and articulations of drivers in real-world driving conditions. Yawning has a mouth deformity that is similar to a few other face motions and articulations. From inconspicuous facial activities, a keyframe determination calculation is used to select the most delegate outline layout. This algorithm swiftly removes extra edges using low-cost photo histograms and recognises anomalies based on the middle outright deviation. The majority of these methods are ineffective and rely on transient factors.Furthermore, these tactics are ineffective in a few situations, such as face actions and articulations that cause a mouth deformity comparable to yawning. This algorithm has a cheap computation cost and effectively removes extraneous outlines from casing groupings with distinct casings. Second, using a 3D convolutional network, an inconspicuous activity acknowledgment network is developed to eliminate spatiotemporal features and differentiate yawn in.However, the proposed strategy's feasibility is harmed by its poor picture aim and high camera shaking. These barriers will be addressed in a future focus by applying more advanced image pre-handling procedures, such as goal recovery for low-goal photos using picture super-goal and deblurring strategies to deal with obscured pictures. 2.2 Challenges and limitations in the existing system Various studies have revealed to driver fatigue is the leading cause to avoid traffic accidents. Because of the complex facial actions and expressions of drivers, Accurate and robust detection of yawning is difficult. Several facial actions and expressions. Making it difficult to distinguish and locate the correct one. 2.3 Methodology Pre-processing and training the model (CNN): Image reshaping, resizing, and array conversion are all part of the dataset's pre-processing. The same processing is applied to the test image. Each of the roughly yawn pictures in this dataset can also be used as a software test image. The training dataset is used to teach the machine (CNN) how to recognize the test image and the yawn it made.
270
IoT, Cloud and Data Science
Fig 2.3.1 CNN Model CNN includes some layers such as Dense, Dropout, Flatten, Convolution2D, and MaxPooling2D. After the model has been properly trained, the software can determine whether the image in the dataset is that of a yawn. After successful training and pre-processing, the test image and trained model are compared. 3. Proposed System The new result of Alexnet convolutional brain organisations (CNNs) in projects like Yawn face characterisation extends to the problem of Yawning. In the following sections, we will give you an overview of project. Which is to organise some photos of human faces into specific Yawn classes. Rely on standard AI and discrete highlights that don't perform well when applied to previously unseen data. Without any prior planning, we implemented three distinct classifiers. One convolutional layer in a gauge classifier A CNN with the right convolutional layer sizes A more advanced CNN having a predetermined number of convolutional layers, channel attributes, and channel count. We tweaked learning rate, regularisation, and dropout parameters for each of these models. The framework learned a set of burdens for the organisation while preparing, which included dark scale drawings of countenances with their unique articulation names. A photograph with a face was used as information in the preparation process. The yield from the second secret layer is linked to a yield layer with seven distinct classes, and the outcome is calculated using the probabilities for each of the seven classes. The predicted class is the one with the highest possibility. 4. Deep Learning-Based Methods The deep learning methodologies have progressed, helping to create deep learning frameworks to solve problems. Deep learning algorithms have been used by a few researchers to solve the challenge of yawning detection. Proposed progressive yawning. The framework uses a combination of network and traditional characteristics to improve detection accuracy and reduce the time spent obtaining the features. To classify facial expression features and fed the network a face image directly. Despite the fact that this method improves classification accuracy, it still has the same flaws as single static image-based methods. Critical temporal information is lost when only single static photographs are utilized for categorization.
Fig 2.3.2 Deep learning Methods
Advances in Science and Technology Vol. 124
271
4.1 Keyframe Selection used in Video Processing A keyframe is a representative frame in a video. Face action keyframes are used to eliminate unnecessary frames and help the recognition network to learn the features of delicate facial action rapidly. [24] provided a method for obtaining each frame's pyramid motion feature and using the AdaBoost algorithm to select keyframes based on the derived pyramid feature. They employ a clustering approach to choose keyframes. The benchmarking tool takes into account a variety of lowlevel features. Clustering for keyframe selection, on the other hand, has the disadvantage of being difficult to calculate. In the case of safe driving, detection systems should be able to detect in real-time, using a simple and effective technique. To attain these traits, a new keyframe selection technique is created in this work. Outlier detection is combined with histogram distance filtering based on a threshold. Global features have some image distances and dimensions are more essential in categorization than local features. Histograms can be used to successfully reduce false positives. A selection algorithm is a formula that is used to make a decision. Algorithm 1 summarises the proposed algorithm. The distance between the color histograms is obtained by utilizing ED to calculate the distance. where n is in the dimension of some picture color histogram A set D of distances between Fj and Fj+1 is produced after the calculation obtained. The half of a maximum and minimum distance, the half of minimum distances, and the half of maximum distances are four commonly used threshold calculating approaches. The average distance, as well as half of the maximum and average distances, are taken into account. The second selection is sorted by removing those that are similar to the immediate frames. The suggested technique employs MAD to find outliers. The formula is used to determine the MAD of the unary sequence X. Ki,i+1 denotes the two keyframes in succession. The average distance, as well as half of the maximum and average distances, are considered. Experiments using the specified keyframes and alternative thresholds. Where d stands for distance set and YT stands for threshold. is yawning while speaking, and the percentage indicates how accurate the detection is. A high level of accuracy indicates that the method employed is efficient. keyframe.
4.2 Convolution of 3D-LTS in 3D The method proposed combines yawning detection with action recognition. The field of action recognition has progressed significantly. The accuracy and speed of the machine have both improved. To recognize actions, researchers have proposed a number of networks. As well as a threedimensional convolutional network [35, 36]. Three-dimensional convolutional networks have received a lot of attention in action recognition [37]. As well as action similarity analysis [38, 39]. 3D convolutional networks have a shorter computation time than two-stream network-based approaches for spatiotemporal feature extraction. exactitude Using the 3D convolution process. Several researchers competed to outperform one another. 2D convolved sequential feature maps were used in actions to categorize video. However, because 2D convolutional networks can only work with 2D inputs, temporal information is lost during the process. The previous layer extracts information from local neighborhoods on feature maps. Then there's a bias. The activation function processes the application's output. At points (a, b), the following unit value is:
272
IoT, Cloud and Data Science
They proposed using a three-dimensional convolutional kernel that is homogeneous in three dimensions. A size of 3 3 3 produces the best results. The following formula determines the value at position (a, b, c) in each feature map of any single layer.
4.3 The architecture of the 3D-LTS Model For Long-term temporal feature sequences that are represented and analyzed in the video. There are two sections to 3D-LTS. The first section is a short-term network for extracting spatiotemporal features. The network will be improved with the proposed changes. LTS is a feature that allows you to detect small movements. The LTS characteristics will be discussed in the following section.
Fig 4.3.1 3D-LTS Model 5. Keyframe Algorithm
6. Convolutional neural network (CNN) A convolutional neural network (CNN or ConvNet) is a deep learning network architecture that learns directly from data. To simply put, two matrices are multiplied to generate an output that is used to extract image features. CNN architecture is split into two sections:
Advances in Science and Technology Vol. 124
273
Fig. 6.1 Convolution operation
Fig. 6.2 Convolution with RGB images A deep learning model that follows the links between input data, multiple layers, and outputs. 7. Module Description: Import the Given Image From Dataset: We must use the Keras pre-processing picture data generator function to import our data set, as well as produce size, rescale, range, zoom range, and horizontal flip. From here, we must train using our own built network by layering CNN. 8 Experiment Analysis: 8.1 Eye Open
Fig 8.1.1 Eye open 8.2 Eye Close
Fig 8.2.1 Eye close
274
IoT, Cloud and Data Science
8.3 YAWN
Fig 8.3.1 Yawn faces 8.4 NoYawn
Fig 8.3.1 No yawn faces 9. To Train the Module by Given Image Dataset:
Fig 9.1 Training Image Dataset 10. Convo Layer Convolution is an orderly procedure that involves the intertwining of two sources of information; it is an operation that transforms one function into another. CNNs impose a pattern of local connection between neurons in neighboring layers. (e.g., enhance edges and emboss).
Advances in Science and Technology Vol. 124
275
11.Classification Identification: The Keras preprocessing package is used to provide the input image. Using the pillow and image to array function packages, that input image was turned into an array value. In our dataset, we have already categorized yawn photos. It categorizes the many types of yawns. Then we must use the predict function to forecast our yawn.
Fig 11.1 Feature Extraction The approach for recognizing yawns is based on a two-channel architecture that can distinguish yawn images. The yawn sections are clipped and removed before being fed into the CNN's inception layer. Feature extraction and classification using a convolutional neural network are part of the training phase. 12. Deploying the model in Django Framework and predicting output: It is used in our Django framework to improve the user experience and predict if an OCT picture is CNV, DME, DRUSEN, or NORMAL. 13. Output Screenshot:
Fig 13.1 Input
Fig 13.2 Result
276
IoT, Cloud and Data Science
14. Conclusion It focuses on how a CNN model was utilized using an image from a provided dataset (trained dataset) and previous data set to predict the pattern of a driver's yawn. The following observations on driver yawn prediction are the result of this. The CNN classification system's ability to automatically classify images is a significant advantage. The most prevalent causes of ocular drowsiness are lack of sleep and tiredness. In this study, we looked at how to get a yawn image data set, preprocessing techniques, feature extraction tactics, and classification schemes, as well as an overview of the methodology for recognizing driver yawn photos. References [1]
M. A. Su-Gang et al., “Yawning detection algorithm based on convolutional neural network,” Comput. Sci., 2018.
[2]
H. A. Kholerdi, N. TaheriNejad, R. Ghaderi, and Y. Baleghi, “Driver’s drowsiness detection using an enhanced image processing technique inspired by the human visual system,” Connection Sci., vol. 28, no. 1, pp. 27–46, Jan 2, 2016.
[3]
Jongseong Gwak , Akinari Hirao, Motoki Shino “An Investigation of Early Detection of Driver Drowsiness Using Ensemble Machine Learning Based on Hybrid Sensing” 2020.
[4]
F. Faraji, F. Lotfi, J. Khorramdel, A. Najafi, A. Ghaffari “Drowsiness Detection Based On Driver Temporal Behavior Using a New Developed Dataset” 2021.
[5]
Qaisar Abbas “FatigueAlert: A real-time fatigue detection system using hybrid features and Pre-train CNN model”2020.
[6]
C. M. Martinez, M. Heucke, F. Wang, B. Gao, and D. Cao, “Driving style recognition for intelligent vehicle control and advanced driver assistance: A Survey,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 3, pp. 666–676, Mar. 2018.
[7]
P. S. Rau, “Drowsy driver detection and warning system for com- mercial vehicle drivers: Field operational test design, data analy- ses, and progress,” in Proc. Nat. Highway Traffic Saf. Admin., 2005, pp. 05– 0192.
[8]
A. D. McDonald, J. D. Lee, C. Schwarz, and T. L. Brown, “Steer-ing in a random forest: Ensemble learning for detecting drowsiness- related lane departures,” Human Factors, vol. 56, no. 5, pp. 986–998, Aug. 2014.
[9]
W. Sun, X. Zhang, S. Peeta, X. He, and Y. Li, “A real-time fatigue driv-ing recognition method incorporating contextual features and two fusion levels,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 12, pp. 3408–3420, Dec. 2017.
[10] Y. Zhang and C. J. Hua, “Driver fatigue recognition based on facial ex-pression analysis using local binary patterns,” Optik, vol. 126, no. 23, pp. 4501–4505, 2015. [11] C. Anitha, M. K. Venkatesha, and B. S. Adiga, “A two fold expert sys-tem for yawning detection,” in Proc. IEEE 2nd Int. Conf. Intell. Comput., Commun. Convergence., vol. 92, pp. 63–71, 2016.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 277-284 doi:10.4028/p-z04kn1 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-01 Accepted: 2022-09-16 Online: 2023-02-27
Diagnosis of Alzheimer’s Disease Using CNN on MRI Data Pranay Agarwal1,a*, Vikhyat Jagawat2,b, Jathiswar B.3,c, Poonkodi M.4,d Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus, No.1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India
1,2,3,4
[email protected], [email protected], [email protected], d poonkodm @srmist.edu.in
a
Keywords: Alzheimer’s, CNN, MRI Images, Dementia
Abstract. Alzheimer’s disease is a delusional brain syndrome that causes hindrance in the functional ability of a person. This is progressively marked by shrinking of the brain and continuous loss of brain cells. Consequently, it leads to death and thus it becomes important to come up with a system that can catch this disease early on. MRI (Magnetic Resonance Imaging) has evolved into a valuable medical diagnostic tool for the diagnosis of brain and other medical imaging over time. In the past a lot of data has been collected by different researchers and a variety of machine learning algorithms have been used to diagnose this disorder and label it into different classes. Through this project we are presenting a CNN based model trained on MRI images to diagnose this disease effectively. The use of CNN is a no-brainer as apart from being an excellent classifier, it is a very good feature extractor which reduces the overall cost of feature engineering. The proposed model takes an MRI image as input and classifies it into very mild, mild, moderate or no disease categories. The trained model has a 95 percent accuracy rate. Introduction The term command center is very appropriate to describe the brain which is part of the central nervous system. The overall structure and working of brain depend upon body size and sex of the organism. Studies show that male have larger brains than females of around 10%. The partially symmetrical left and right brain along with connections all across them through nerve fibers called corpus callosum form the hemispheres of the brain. Complications in the brain are found by the assessment of valleys, gyri or wiggly hills and like any part of the body getting prone to diseases during lifetime of an organism, the human brain too suffers from various problems, Alzheimer being a prime one among them. Alzheimer causes the brain cell to shrink and the person is subjected to memory loss and other mental issues. This is usually marked as the second stage of EMCI [1,2]. The situation of families and medical clinics worsen when the impact of this disease goes out of hand [3]. It is observed that old people have a high amount of risk to get this disease and there hasn’t been a cure to be discovered for this yet. Since this is a very important and common disease in the field of neurology, scientists and researchers are working on predictions and diagnosis of this disease using different scans like Magnetic Resource Imaging (MRI), Structural Magnetic Resource Imaging (sMRI) and so on. Due to the emerging technology like Artificial Intelligence and Machine Learning, Computer Aided Diagnosis are being developed and used to help a lot of people in the field of medicine and science to get better insights and fast results. MRI is by far the least expensive technique which captures the anatomical regions of the brain and is used worldwide. Alzheimer disease occurs in different stages. It starts from people getting confused and forgetful about what is happening in their life and not doing proper self-care to the stage of officially diagnosing and finding significant impact of the disease in a person. These high-level features can be recognized earlier as a solution to this problem with the rapid deep learning techniques growing like the Convolutional Neural Networks. At every stage of hidden layers in these models, many features are identified which are not possible to the naked eye and that too without any kind of supervision from
278
IoT, Cloud and Data Science
experts which is quite overwhelming to many medical and technology enthusiasts. Therefore, doctors today could spend more time in direct patient care and reduce burnout. To obtain such intuitive features, previous works have applied many kinds of machine learning algorithms like Support Vector Machines, K- nearest neighborhood, Naïve Bayes which have provided an average accuracy in their classification. In this paper, we plan to experiment the use of the Convolutional Neural Network (CNN) model and focus to derive a better accuracy compared to other models. The best part about our technique is that CNN does not need much preprocessing and can handle slight inconsistencies in the given data so we reduce much effort in developing human functionalities. In the current market CNN is considered a niche as it was developed seeing the heavy drawbacks of many previous models and has gained a lot of improvements over time. Not only the fact that it can perform well, it has also shown better optimized use of memory and computational requirements for its execution which is very important for its feasibility. Weight sharing is a quality in which the results are shared among the weights between each layer therefore all the weights are same in each layer and this is found in Convolutional Neural Network which makes it special among other networks and thus motivates our experimentation on it. Over the past few decades, there has been a tremendous improvement in the field of medical imaging and many advanced machines have been invented to give very accurate information of image data to patients worldwide. Thus, we found a need to not only utilize this opportunity in the field of machine learning and artificial intelligence but also look forward to giving the most desirable predictions in quick real time. Data analytics has caused a huge boom in the market of all kinds of businesses and has a promising future. We were thus motivated to be part of this advancement, hoping to contribute our research and findings enhancing the growth of this community. Related Works In [4], The authors have made use of EEG signals with the help of robust PCA to investigate the presence of dementia in 13 subjects. Four Machine learning algorithms such as KNN, SVM, Decision tree and naives bayes have been implemented in order to find their ability to check AD using EEG signals; Gaussian SVM and Naives bayes showed very good accuracy of more than 90% whereas other algorithms showed poor accuracy of less than 60%. Afreen Khan, Swaleha Zubair aimed to evaluate the performance of ensemble based random forest classifier to detect Alzheimer's disease on MRI data of 150 subjects [5]. They used dropping method and imputation method to handle missing data and concluded that imputation method showed better results. The main drawback of this system is that it had low MRI features and less number of subjects for model to be trained upon. In [6] , Different deep learning methods for Alzheimer’s Disease Diagnosis was carried on scans like MRI, PET,MRI+PET,MRI+PET+CSF,PET+PET and performance measures were compared.Feature extraction, feature selection, dimensionality reduction, and a feature-based classification method were all part of the classification process. Limitations included Transparency which states the difficulty in determining how the selected features lead to a conclusion and the relative importance of specific features or subclasses of features. Sivakani. R and Gufran Ahmad Ansari used feature extraction and feature selection process and then performed machine learning algorithm to classify the oasis longitudinal dataset [7]. In this paper clustering has been used for feature extraction using the EM algorithm. This algorithm has slow convergence and it generates convergence till local optima only. This is accomplished by applying the best first search method and evaluating the subset using the cfssubseteval evaluator. Then the gaussian process is used to classify the data, and the linear regression model is used to develop the model. Linear regression classifies data based on dependent and independent data. Afreen Khan, Swaleha Zubair proposed the use of multiple machine learning algorithms and then comparing their results to classify data into demented and non-demented categories [8]. The data was taken from OASIS (longitudinal MRI data) wand consisted of 150 subjects, 78 demented and 72 nondemented. Certain factors like age, MMSE, Hand and gender were used for the purpose of classification. Highest accuracies were shown by random forest and extra Trees of 85% and 84%
Advances in Science and Technology Vol. 124
279
respectively. The major limitation was that data consisted of low number of samples and time for feature extraction is very high. The author used SVM to seek patterns in EEG in order to distinguish Alzheimer's samples from others [9]. They attempted to extract certain characteristics from EEG data that may be utilized as SVM [10] input. Frontal and rear coherences, long distance coherences, and homologous coherences are only a few examples. Farhana Sultana , Abu Sufian and Paramartha Dutta analyzed the inner structure of convolutional neural networks and how it classifies images [11]. They compared different CNN Architectures like LeNet,AlexNet,ZFNet,VGGNet, GoogleNet,ResNet and DenseNet.The performance indicators of several CNNs were discovered and the error rate was calculated. Despite the fact that AlexNet, ZFNet, and VGGNet used the same LeNet-5 concept as LeNet-5, their structures are larger and more complex. SENet's performance on the ImageNet dataset gives us optimism that it will be effective for other tasks requiring strong discriminative features. Methodology This section will cover all our proposed work to develop and implement a convolutional neural network and train the model on our dataset, followed by the implementation of the flask webframework and building its user interface. Figure 1 illustrates the overall workflow of our project and the subsections explains our methodology.
Fig. 1 Architecture diagram of the project workflow A. Dataset Collection The dataset is obtained from Kaggle (an open-source platform for Machine Learning). We collected MRI (Magnetic Resonance Imaging) data, consisting over 6400 MRI images which are further divided into 4 different classes naming Mild Demented, Moderate Demented, Very Mild Demented and Non Demented. This structured magnetic resonance imaging data is downloaded in jpeg format. Each image size is 176 x 208 pixels. These data images are further divided into train and test sets using a certain ratio. The figure below displays a small chunk of our data belonging to different classes. B. Preprocessing The MRI images are preprocessed using the python libraries (numpy, opencv, and random). We load the data into memory in the sizes of 120 x 120 pixels and in grayscale color mode. The data is then loaded as numpy arrays in the shape of (120, 120, 1), here 120 indicates the number of pixels in a particular row and column, 1 indicates the color channels. We then normalize the data to bring it to a standard scale and it does not distort differences in the range of values, this is done by a simple mathematical calculation given in Eq. 1. We do this because the pixel value ranges from 0 to 255 and we normalize the data in the range of 0 to 1. data = img / 255.0 .
(1)
280
IoT, Cloud and Data Science
Then we use the seed method from the random library, so that we get the same results in every experimentation. Now let's visualize the data by using the visualization techniques matplotlib module in python (Figure 2)
Fig. 2 Visualization of 10 samples from the class NonDemented C. Feature Extraction The preprocessed sequential MRI data is passed through a sequential convolutional neural network model where feature extraction takes place. The process of feature extraction involves layers of convolutional and max pooling wherein different features are extracted at each layer of the model. The primary purpose of feature extraction is to reduce down the dimensions of the model and bring it into simpler forms. The model is then flattened and results are obtained through a softmax layer having a dropout rate for better prediction. D. Convolutional Neural Network A (ConvNet) Convolutional Neural Network [12] architecture is a Deep Learning algorithm, which has proven better results in problems involving computer vision. This algorithm, CNN, recognizes patterns in the data. It takes images as inputs which are then converted into numpy arrays or tensors with specific dimensions and color channels. A CNN architecture consists of many different layers, namely, convolutional layers, activation layers, pooling layers, fully connected layers, etc. The architecture of a CNN model depends on the image size and the complexity of the image which could determine how many layers and filters are required to train the model with highest accuracy and minimal time. All the layers perform different functions on the input data to extract features that could lead to a particular classification. Figure 3 illustrates a typical architecture of a convolutional neural network model.
Fig. 3 Example of a CNN architecture (i).Input Layer: The input layer feeds data into the neural network in the form of a multidimensional array, where in our case the input shape (120, 120, 1). (ii).Convolutional Layers: The first few layers of convolutional are very important as they extract crucial features from the data. The most common parameters of this layer are kernel size, number of filters and activation function. The first convolutional neuron layer performs element wise dot product with the kernel and the data to produce output for the next layer and the following layer uses these output to perform the dot product with a unique kernel[13]. The output which we receive at the end from this layer is termed as a feature map which provides information regarding various features of
Advances in Science and Technology Vol. 124
281
the image including edges and vertices. This output is used as an input to forthcoming layers to extract more attributes from our image. (iii). Pooling Layers: It is common knowledge that as the size of a feature map grows, the computing expenses grow exponentially. As a result, the Pooling Layer's primary job is to lower the size of this feature map in order to reduce computational expenses. Between a Convolutional Layer and a Fully Connected Layer, this layer serves as a bridge. The job of max pooling is to extract the biggest element from the given pool size, while the job of average pooling is to determine the average of all elements in a certain portion of the picture. (iv). Fully Connected Layer: This layer receives flattened images from other levels as input. Because this layer is positioned before an output layer, categorization begins here. Let’s take some notations to understand this more precisely. In any neural network, the below stated calculation is bound to happen:F ( WX + b ).
(2)
Where f= activation function, mostly ReLu W= the weight matrix of dimensions p x n where p= previous layer neurons and n = current layer neurons X= an input of dimension p x n b= bias of dimension p x n The above calculation is repetitively performed on all the layers. Once the output is passed through the FC layer, finally a softmax function is used to determine the probability of our input belonging to a particular class i.e. classification. Experimentation & Results The suggested CNN architecture consists of the following components: an input layer which takes its input_shape argument as (120, 120, 1), four convolutional layers, three max pooling layers, a flatten layer, a dropout layer and two dense layers which also includes the output layer. The convolutional layers consist of 16, 32, 64 and 64 filters respectively. All the convolutional layers comprises of the same argument specifications; the kernel_size is of 3 x 3 matrix with a stride of 1 x 1; the activation function in all the convolutional layers is ReLU [14]; the max pooling layers comprises pool_size of 2 x 2 with a stride of 1; one dense layers consists of 512 units with ReLU activation followed by a dropout layer with a dropout rate of 0.5; the last dense layer uses softmax function to produce probability prediction for a given sample. Fig. 4 Illustrates the summary of the proposed architecture.
282
IoT, Cloud and Data Science
Fig. 4 Summary of CNN architecture We used accuracy, precision, recall and F1 score as our evaluation metrics and hence built a confusion matrix.
Fig. 5 Confusion Matrix Fig. 5 shows different metrics obtained from different classes where class 0 represents Nondemented, class 1 represents Mild Demented, class 2 represents Moderate Demented and class 3 represents very mild demented class.
Fig. 6 Performance Metrics
Advances in Science and Technology Vol. 124
283
Model Deployment The saved model is inferenced with the help of a web page made using the Flask framework with a front end to enable input of scans from the user. After obtaining the scan uploaded, the deployed model gives a prediction which is passed on to the user interface. In this way, the user can have a check of what level of severity is the alzheimer scan whether non demented, moderate, very mild demented or mild demented. Therefore, this kind of an application can help in getting an instant and user-friendly result. Conclusion & Future Scope Over the years Alzheimer’s is one of the leading causes of deaths in the world, thus this project comes up with a valuable solution to detect it in the initial phases. It is high time that awareness be spread among the general public to keep getting checked up if they feel any symptoms of this disease, especially older people. On a broader note this project aims to classify the brain scam into 4 classes than 2 classes. The classes as discussed are : “Non demented, Mild demented, Very Mild demented and Moderate demented”. We observe that our trained model achieves an accuracy of more than 95% and the interactive interface does its work properly as well. In the future we would like researchers to collect more data in the Moderate demented class so as to bring equal samples for all the classes. It is also recommended that further steps be taken in installing such softwares directly in MRI scanners to to reduce time between taking the scan and detection of Alzheimer’s. References [1] Pinto, T. C., Machado, L., Bulgacov, T. M., Rodrigues-Júnior, A. L., Costa, M. L., Ximenes, R. C., & Sougey, E. B. . “Is the Montreal Cognitive Assessment (MoCA) screening superior to the MiniMental State Examination (MMSE) in the detection of mild cognitive impairment (MCI) and Alzheimer’s Disease (AD) in the elderly?”. International psychogeriatrics, 31(4), 491-504, (2019) [2] Ottoy, J., Niemantsverdriet, E., Verhaeghe, J., De Roeck, E., Struyfs, H., Somers, C., ... & Staelens, S. . “Association of short-term cognitive decline and MCI-to-AD dementia conversion with CSF, MRI, amyloid-and 18F-FDG-PET imaging”. NeuroImage: Clinical, 22, 101771.(2019) [3] Kam, T. E., Zhang, H., Jiao, Z., & Shen, D. . “Deep learning of static and dynamic brain functional networks for early MCI detection.” IEEE transactions on medical imaging, 39(2), 478487. (2019) [4] Biagetti, G., Crippa, P., Falaschetti, L., Luzzi, S., & Turchetti, C. . “Classification of Alzheimer’s Disease from EEG Signal Using Robust-PCA Feature Extraction”. Procedia Computer Science, 192, 3114-3122.(2021) [5] Khan, A., & Zubair, S. . “Usage of random forest ensemble classifier based imputation and its potential in the diagnosis of Alzheimer’s disease”. Int. J. Sci. Technol. Res, 8(12), 271-275. (2019) [6] Taeho Jo , Kwangsik Nho and Andrew J. Saykin. “Deep Learning In Alzheimer’s Disease Diagnostics: Diagnostic Classification and Prognostic Prediction in Neuroimaging data”, in Frontiers in aging Neuroscience, Volume 11, doi:10.3389/fnagi.2019.00220, (2019) [7] Sivakani, R., & Ansari, G. A. “Machine Learning Framework for Implementing Alzheimer’s Disease”. In 2020 International Conference on Communication and Signal Processing (ICCSP) (pp. 0588-0592). IEEE, (2020) [8] Khan, A., & Zubair, S. . “An improved multi-modal based machine learning approach for the prognosis of Alzheimer’s disease”. Journal of King Saud University-Computer and Information Sciences. (2020)
284
IoT, Cloud and Data Science
[9] Trambaiolli, L. R., Lorena, A. C., Fraga, F. J., Kanda, P. A., Anghinah, R., & Nitrini, R. “Improving Alzheimer's disease diagnosis with machine learning techniques”. Clinical EEG and neuroscience, 42(3), 160-165, (2011) [10] Fuse, H., Oishi, K., Maikusa, N., Fukami, T., & Japanese Alzheimer's Disease Neuroimaging Initiative. “Detection of Alzheimer's disease with shape analysis of MRI images”. In 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS) (pp. 1031-1034). IEEE, (2018) [11] Sultana, F., Sufian, A., & Dutta, P. “Advancements in image classification using convolutional neural network” In 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 122-129). IEEE.(2018) [12] Albawi, S., Mohammed, T. A., & Al-Zawi, S. “Understanding of a convolutional neural network”. In 2017 international conference on engineering and technology (ICET) (pp. 1-6). Ieee. (2017) [13] Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., ... & Chen, T. “Recent advances in convolutional neural networks”. Pattern Recognition, 77, 354-377, (2018) [14] Agarap, A. F. “Deep learning using rectified linear units (relu)”. arXiv preprint arXiv:1803.08375, (2018)
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 285-294 doi:10.4028/p-5z45nx © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-01 Accepted: 2022-09-16 Online: 2023-02-27
Predicting and Classifying Diabetic Retinopathy (DR) Using 5-Class Label Based on Pre-Trained Deep Learning Models S. Karthika1,a*, M. Durgadevi2,b Department of Computer Science and Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Vadapalani Campus No.1, Jawaharlal Nehru Road, Vadapalani TN, India
1,2
[email protected], [email protected]
a
Keywords: Convolutional Neural Network, Diabetic Retinopathy, Proliferative_DR, Pre-trained models, MobileNetV2.
Abstract. Diabetic Retinopathy (DR) is a condition in which damage to the eyes occurs as a result of diabetes mellitus. It is the most frequent diabetes-related eye condition. It can also cause full blindness and vision loss. With effective eye treatment, the majority of new occurrences of diabetic retinopathy can be reduced. Early detection helps to avoid total vision loss. However, detecting it early can be difficult because it may not present symptoms in the early stages. The wide selection of fundus imaging makes classification challenging, mainly in Proliferative_DR, which includes the formation of new vessels in retina and bleeding. Pre-trained deep learning model is used on the publicly accessible retinal fundus image dataset on kaggle in this paper (APTOS 2019 Blindness Detection). Pre-processing and augmentation procedures are used to increase the accuracy of the models that have been pre-trained. The training accuracy of 8-Layer Convolutional Neural Network (CNN) and MobileNetV2 obtained is 83.07% and 85.21%. Testing accuracy achieved 71.93% using CNN & MobileNetV2 is 83.42%. The most often employed measures, such as the F1 Score, precision, and recall is used to ignore class level of label disagreement, which aids in diagnosing all phases of diabetic retinopathy. The results using a confusion matrix is analyzed, which is useful for categorising different stages of diabetic retinopathy according to severity. It also takes into account the degree of mismatch between the actual and anticipated labels. Introduction Diabetes Mellitus (DM) is a metabolic disorder in which the body produces inadequate insulin, resulting in excessive blood sugar levels. Diabetes affects around 62 million people in India alone. If a person have diabetes for more than 15 years then they have a high probability of getting DR [1]. The final stage of DR causes blindness which damages retinal blood vessel because of DM [2]. With appropriate medicine and frequent eye monitoring, at least 90% of new cases might be avoided [3]. It is the most frequent diabetes-related eye condition. It can also cause full blindness and vision loss. With effective eye treatment, the majority of new occurrences of diabetic retinopathy can be reduced. Early detection helps to avoid total vision loss. However, detecting it early can be difficult because it may not present symptoms in the early stages. Fundus imaging classification is difficult due to high variability, particularly in the case of Proliferative_DR, which includes retinal proliferation of new blood vessels and retinal detachment. As a result, a number of deep neural network-based techniques for classifying diabetic retinopathy based on severity have been developed [4]. High variability is a critical difficulty for fundus picture categorization using a deep neural network, in the case of retinal proliferation and retinal detachment of new blood vessels, decreasing the network's accuracy.[5] Diabetic Retinopathy Stages [6] Mild_DR. Microaneurysms appeasr in the early stages of the disease. Balloon like swelling appears in the retinal blood vessels in tiny areas known as Microaneurysms. Moderate_DR. When the blood vessels starts swelling and distortion it leads to moderate stage. This affects the blood transportion to the fundus image. These condition changes retinal appearance.
286
IoT, Cloud and Data Science
Severe_DR. In this condition, all blood arteries close completely. As aresult, blood flow to portions of fundus is disrupted. These generates a new blood vessels and hides the growth factors during severe stage. Proliferative_DR. Proliferative_DR is now at its most advanced stage. At this point, the growth factor stimulates new blood vessels to form in the retina. They can leak and bleed since these new blood arteries are fragile in nature. Retinal detachment results from scar tissue contracting as a result of this. The retina is ripped away from the underlying tissue in a process known as retinal detachment. A retinal detachment might result in vision loss that is permanent. The novelty of the current work is described using a 2 different pre-trained models to detect the different classes of DR.
Fig. 1 Different Stages of DR Following is how paper is structured. The background and related work is presented in Section II. CNN & MobileNetV2 methods are described in the next Section III. Dataset & outcomes obtained during the experiment are described in Section IV. The final section comprises the discussion and summary. Releated Works In the late twentieth and early twenty-first centuries, researchers have made significant progress in recognising the onset of DR and providing medication that minimises the vision loss. Previously, various image processing techniques and deep learning algorithms were used to detect distinct phases of DR for explicit feature extraction and classification. When it comes to obtaining important data from an image, image processing is crucial. Previously, a great deal of study has been done on detecting DR in a clinical collection of photos. To detect the DR from a given image dataset, several new state-of-the-art techniques are applied. Many researchers have contributed to the effort to address the detection of DR. Gangwar et al. [7] combined an Inception- ResNet-v2 pre-trained model model with CNN layers for the detection of DR. Messidor-1 diabetic retinopathy & APTOS 2019 blindness detection were employed in the proposed study. The application of transfer learning yields positive results. To extract the diabetes pictures, Reddy et al. [8] used the min-max normalisation technique on the supplied dataset. The ensemble-based ML techniques were applied after the photos were preprocessed. The results suggest that ensemble learning methods outperform typical machine learning techniques. Gupta et al. [9] used multiple DL pretrained techniques for feature extraction from the pictures dataset of numerous lesions, including Inception v3, VGG16, and VGG19 models.
Advances in Science and Technology Vol. 124
287
The retrieved features were fed into ML classifiers to diagnose the lesions. For the classification of various lesions, Nguyen et al. [10] used various DL models. When DL techniques are used, automatic detection becomes more easier than manual detection. The attractive results were discovered utilising the above models, as can be shown. The accuracy was 82 percent, while the sensitivity was 80 percent. Pratt et al [11] presented a CNN-based method for identifying DR. They were able to obtain a sensitivity of 95% and a precision of 75%. For recognising the staging in DR, Lam, C. et al. [12] employed CNN on a colour fundus picture. On pre-trained models, they applied transfer learning. Gulshan Vet al [13] developed a deep learning-based method for detecting DR and macular edoema. Methods Convolutional neural network (CNN). Deep learning architecture is made up of three different layers such as convolutional, pooling, fully connected layers, and activation functions. It is built on the foundation of a standard neural network design, which includes input, output, and more hidden layers. CNN classification takes image as an input, processes with hidden layers, and classify the outputs based on the input [14]. CNN employs convolution layers, which automatically extract information from an image [15]. It achieves this by using small squares to keep the relationship between pixels intact. Various collection of filters known as kernels is used to create the final image. This generates a feature mapping function, which is also known as an activation map. Pooling layers are used to reduce the dimensionality in the dataset. It is always applied after the convolutional layer and operates on each activiation map. Max, sum, and average are the most common pooling procedures. The transfer function, also known as the activation function, is a non-linear function that can be either linear or non-linear. The most widely utilised activation functions are Sigmoid, Tanh, ReLU, and leaky ReLU. MobileNetV2. For applications involving mobile and embedded visions, MobileNetV2, a new type of deep learning network, is being introduced [16]. It is based on an inverted residual connection bottleneck layer residual structure. The intermediate expansion layer filters features with lightweight depthwise convolutions as a source of non-linearity. A fully convolutional layer with 32 filters is followed by 19 layers in the MobileNetV2 architecture. Dataset and Experimental Results Overview of the dataset as well as the results of an experiments is provided in this section. Dataset Distribution. The images are retinal images that have been filtered with a gaussian filter and are used to detect DR. This dataset contains 3664 retinal images. This dataset is based on the KaggleAPTOS 2019 dataset for Diabetic Retinopathy Detection, which is available on kaggle [17]. The size of these photographs has been reduced to 224 × 224 pixels. This is the most popular size, and it may be utilised with a variety of models that have already been trained. All of the images have previously been saved into their relevant folders based on the severity/stage of diabetic The images are organised into five files, one for each of the five classes. There are a total of L0 to L4 folders. Table 1 shows the dataset distribution, with a pictorial depiction in Fig. 2. Table 1 Distributioпочекn of Dataset Label
Class
No. of datasets
L0 L1 L2 L3 L4
Class_No_DR Class_Mild_DR Class_Moderate_DR Class_Severe_DR Class_Proliferate_DR
F1-1805 F2-370 F3-999 F4-193 F5-295
288
IoT, Cloud and Data Science
Fig. 2 Representation of Each Dataset Category Data Augmentation. Due to a scarcity of labelled data CNN model requires sufficient training data, which is unavailable in the medical area. Data augmentation is required for this, which lowers model overfitting and hence increases localisation capabilities. With a rotation range of 30 and a horizontal flip, data augmentation is applied. It aids in the resolution of the problem of insufficient data during network training [18]. Hardware and Software. Keras and Tensorflow deep learning software is used to conduct the experiments using Colab notebook to run the code. Colab has a Python programming environment that makes using NumPy, Matplotlib, and other Python programmes simple for data visualisation purpose. Experimental Value. Deep learning and fine tuning were used to train pre-models on the collected dataset. It uses adam optimizer and categorical crossentropy for training, with epochs=1, verbose=1, and learning rate of 0.001. By maintaining a 70- 20-10 ratio, the dataset can be divided into training, validation, and testing. During training and validation, it also recorded losses. Losses are errors that happen throughout the prediction process when the model is being trained. The best training method consistently minimizes errors while increasing accuracy. Training can be halted after consistent accuracy and loss are achieved. The findings of the experiment based on train and validation accuracy were provided in Table 2. Table 2 Results of Using 27 Pre-Trained Models SI. NO
Model
Train Accuracy Val Accuracy
Training time (sec)
0
MobileNetV2
0.7952
0.7725
16.75
1
DenseNet169
0.7829
0.7662
25.34
2
ResNet152V2
0.7705
0.7599
25.97
3
InceptionV3
0.7057
0.7538
19.17
4
DenseNet201
0.7326
0.7538
27.69
5
NASNetMobile
0.704
0.7508
29.25
6
MobileNet
0.7316
0.7508
14.66
7
InceptionResNet V2
0.6949
0.7477
28.43
Advances in Science and Technology Vol. 124
289
8
DenseNet121
0.7161
0.7447
41.24
9
ResNet50V2
0.7245
0.7416
17.16
10
Xception
0.7097
0.7386
18.19
11
ResNet101V2
0.7222
0.7204
21.22
12
VGG16
0.6719
0.7112
15.75
13
VGG19
0.6521
0.7021
15.04
14
ResNet50
0.6524
0.6687
17.24
15
ResNet152
0.6372
0.6626
28.66
16
ResNet101
0.6291
0.6353
22.2
17
MobileNetV3La rge
0.5509
0.6049
18.08
18
MobileNetV3S mall
0.4727
0.4833
17.25
19
EfficientNetB1
0.4784
0.4802
22.41
20
EfficientNetB6
0.4798
0.4802
27.34
21
EfficientNetB0
0.4835
0.4802
20.6
22
EfficientNetB2
0.471
0.4802
23.08
23
EfficientNetB3
0.4791
0.4802
25.13
24
EfficientNetB4
0.4804
0.4802
27.15
25
EfficientNetB7
0.4788
0.4802
28.17
26
EfficientNetB5
0.4761
0.4802
27.78
In this case, all of the models have different accuracy values, with MobileNetV2 having the highest training accuracy of 79.52% to the other models. When compared to all 26 models, MobileNetV2 scored a validation accuracy of 78.25%. The comparison accuracy results of training and validation based on pre-trained models are shown in Figures 3 and 4.
290
IoT, Cloud and Data Science
Fig. 3 Accuracy on Training Set After 1 Epoch
Fig. 4 Accuracy on Validation Set After 1 Epoch Using Pre-trained 8-Layer CNN and MobileNet V2. Pre-trained 8-Layer CNN & MobileNetV2 models are trained on the same datasets by substituting the last fully connected layer with the 8-Layer CNN and MobileNet V2 models. To flatten the dataset, it tried two approaches: the first was to use a pretrained 8-Layer CNN as a extractor only by replacing the final layer with 256 nodes of dense layer with relu and softmax activation functions, and the second was to use a classification layer with 5 classes. During training, all other layers were frozen. It employs the Adam optimizer and categorical crossentropy with epochs = 15, batch size = 32, and learning rate of 0.003 for training. In this case, the accuracy of the two models are not identical with MobileNet achieving the maximum training accuracy of 85.21% to the CNN models. Table 3 shows the tunned pre- trained model results. Table 3 Results of Fine Tuned Pre -Trained Models SI. No
Model
Valid_Accuracy
Valid_Loss
Training_Accuracy
Train_Loss
Test_Accuracy
Test_Loss
1
CNN Model
0.8307
1.0605
0.617
0.469
71.93
0.8158
2
MobileNetV2
0.8421
0.582
0.769
0.3837
86.42
0.6789
Advances in Science and Technology Vol. 124
291
Fig. 5 Validation_Accuracy Graph
Fig. 6 Validation_ loss Model Graph
Fig. 7 Classification Report for 5- class label
Finally, MobileNetV2 obtained 86.42% testing accuracy which is higher than CNN. The following Fig. 5 and 6 shows the graph of accuracy and loss for both training and validation obtained by MobileNetV2. While Fig. 7 shows the Classification graph for MobileNetV2. Visualization Heatmap Analysis. In clinical use, interpreting the output of diagnosis-guiding software is important for triaging referrals and focusing one’s clinical examination. Toward that end, we created a heatmap visualization method to represent intuitively the learning procedure of our deep learning network. Below Fig. 8 shows the prediction report of classes. Metrics depending on Confusion Matrix. Accuracy is a critical parameter for classification models. For both binary and multiclass classification problems, it's simple to understand and utilise. Fundus pictures are commonly used in DR to calculate the specificity and sensitivity of each image. Clinical treatment is improved by higher specificity and sensitivity levels. True-Negative (TN1) is used to denote no lesion pixels, while True-Positive (TP1) is used to denote the entire mix of lesion pixels.
292
IoT, Cloud and Data Science
Fig. 8 Prediction Report for 5- class label Similarly, the False-Negative (FN1) and False-Positive (FP1) values represent the variety of lesion pixels and the variety of non-lesion pixels that aren't identified using the method. Accuracy is defined as the percentage of legitimate results out of the total number of records tested (AC). Accuracy can be used to assess a classification model that is purely based on balanced datasets. Accuracy may yield inaccurate findings if the classification dataset is skewed or uneven. Precision is defined as the ratio of true positives to expected positives (PRE). Another important statistic to consider is recall, which provides more information if all possible positives must be recorded. Recall (RE) is the percent of total positive samples divided by correctly predicted as positive . RE is the percentage of positive samples that are accurately projected as positive. The recall is one if every positive sample is anticipated to be positive. If an optimal mix of precision and recall is found, these two measures can be combined to generate a new measure called F1 score. The F1-score is represented by the mean value of precision and recall with number ranging from 0 to 1. All of these measures are evaluated using the formulas in Eq. 1– 4. Fig. 9 depicts the categorised datasets' normailized confusion matrix. AC = TP1 +TN1 / TP1+TN1 +FP1 +FN1 .
(1)
PRE= TP1 / TP1+FP1 .
(2)
RE= TP1 / TP1+FN1 .
(3)
F1 Score= 2*PRE*RE / PRE+RE.
(4)
Advances in Science and Technology Vol. 124
293
Fig. 9 Normalized Confusion Matrix Discussion and Summary Automated diabetic retinopathy diagnostic systems are in high demand. Devices that can diagnose disease from a fundus image without requiring much clinical involvement are always appropriate. On a small dataset of 5 classes and 3664 retinal fundus images, transfer learning and tuning to test the pre-trained 8-Layer CNN and MobileNetV2 models were used. According to the findings, tuning the pre-trained models by substituting high levels of layers provides better accuracy than CNN. Furthermore, as compared to 8-Layer CNN, MobileNetV2 offered somewhat greater accuracy. MobileNetV2 is a CNN model that is less computationally expensive and is specifically developed for usage with embedded systems. Using MobileNetV2 as a pre-trained model, an automated embedded device for diabetic retinopathy diagnosis could be developed in the future. References [1] Kertes PJ, Johnson TM, eds. (2007). Evidence Based Eye Care. Philadelphia, PA: Lippincott Williams & Wilkins. ISBN 0-7817- 6964-7. [2] J. "Diabetic retinopathy". Diabetes.co.uk. Retrieved 25 November 2012.I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350. [3] Tapp RJ, Shaw JE; Harper CA; et al. (June 2003). "The prevalence of and factors associated with diabetic retinopathy in the Australian population". Diabetes Care. 26 (6): 1731–7. doi:10.2337/diacare.26.6.1731. [4] Doshi, A. Shenoy, D. Sidhpura and P. Gharpure, "Diabetic retinopathy detection using deep convolutional neural networks," 2016 International Conference on Computing, Analytics and Security Trends (CAST), Pune, 2016, pp. 261-266. doi: 10.1109/CAST.2016.7914977. [5] Patel, S “Diabetic Retinopathy Detection and Classification using Pre-trained Convolutional Neural Networks.” International Journal on Emerging Technologies, 11(3): 1082–1087, 2020. [6] Sairaj Burewar, Anil Balaji Gonde, Santosh Kumar Vipparthi. "Diabetic Retinopathy Detection by Retinal segmentation with Region merging using CNN", 2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS), 2018. [7] K. Gangwar and V. Ravi, “Diabetic retinopathy detection using transfer learning and deep learning,” in Evolution in Computational Intelligence, Singapore, 2021.
294
IoT, Cloud and Data Science
[8] G. T. Reddy, S. Bhattacharya, S. S. Ramakrishnan et al., “An ensemble based machine learning model for diabetic retinopathy classification,” in Proceedings of the International Conference on Emerging Trends in Information Technology and Engineering (ic- ETITE), Vellore, India, February 2020. [9] S. Gupta, A. Panwar, S. Goel, A. Mittal, R. Nijhawan, and S. A. Kumar, “Classification of lesions in retinal fundus images for diabetic retinopathy using transfer learning,” in Proceedings of the International Conference on Information Technology (ICIT), Bhubaneswar, India, December 2019. [10] Q. H. Nguyen, R. Muthuraman, L. Singh et al., “Diabetic retinopathy detection using deep learning,” in Proceedings of the 4th International Conference on Machine Learning and Soft Computing, Haiphong City, Vietnam, January 2020. [11] Harry Pratt, Frans Coenen, Deborah M. Broadbent, Simon P. Harding, Yalin Zheng. (2016). Convolutional Neural Networks for Diabetic Retinopathy, Procedia Computer Science, 90, 200-205. [12] Lam, C., Yi, D., Guo, M., & Lindsey, T. (2018). Automated Detection of Diabetic Retinopathy using Deep Learning. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, 147–155. [13] Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., Kim, R., Raman, R., Nelson, P. C., Mega, J. L., & Webster, D. R. (2016). Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA, 316(22), 2402–2410. https://doi.org/10.1001/jama.2016.17216. [14] Patel, S. (2020). A Comprehensive Analysis of Convolutional Neural Network Models. International Journal of Advanced Science and Technology, 29(04), 771 - 777. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/4654 [15] Patel R, Patel S. (2020). A Comprehensive Study of Applying Convolutional Neural Network for Computer Vision. International Journal of Advanced Science and Technology, 29(6s), 2161-2174. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/10929. [16] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, arXiv:1704.04861. [17] https://www.kaggle.com/competitions/aptos2019-blindness-detection.. [18] Harry Pratt, Frans Coenen, Deborah M. Broadbent, Simon P. Harding, Yalin Zheng. (2016). Convolutional Neural Networks for Diabetic Retinopathy, Procedia Computer Science, 90, 200-205.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 295-301 doi:10.4028/p-653bh6 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-06 Accepted: 2022-09-16 Online: 2023-02-27
Early Wheat Leaf Disease Detection Using CNN Vedika R.1,a, M. Mithra Lakshmi2,b*, Sakthia R.3,c, K. Meenakshi4,d Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus,No.1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India
1,2,3,4
[email protected], [email protected], [email protected], d [email protected]
a
Keywords: CNN (Convolution Neural Network), LSTM (Long Short term memory), SVM (Support Vector Machine)
Abstract. Smart farming is an innovative technology that aids in the improvement of the country's agricultural produce quality and quantity. Wheat is the most important crop in most parts of India. Wheat leaf diseases have a significant impact on production rates and farmer earnings. It poses a significant danger to food security because it affects crop productivity and degrades crop quality. Accurate and precise disease detection has posed a significant challenge, but recent advances in computer vision enabled by deep learning have paved the road for camera-assisted wheat leaf disease diagnosis. Using a CNN trained with a publicly available wheat leaf disease model, several machine learning algorithms and neuron- and layer-wise visualization methods are applied. Introduction Deep learning enables computational models with several processing layers to learn multiple levels of abstraction for data representation. Deep learning has become the main driver of new applications. It is a branch of artificial intelligence that offers a set of learning methods for modeling data with complicated architecture and performing various nonlinear data transformations. In the given data patterns are recognized in large quantities and new data can be categorized using the same pattern and extract the information on existing data .These have different neural network architectures and use these different architectures we can solve the problems like classification of image or pattern recognition. Two neural network architectures are CNN and LSTM. In natural language processing and image analysis, deep learning beliefs have grown increasingly prominent as machine learning techniques. The availability of more computer resources, more data, new techniques for training deep learning models, and easy-to-use tools and packages for neural network construction and training are the driving forces behind this progress. Visual data is analyzed using convolutional neural networks, which are a sort of artificial neural network. Objective The goal of this initiative is to classify the wheat leaf disease using CNN and then we can verify if the wheat leaf is healthy or unhealthy and if the wheat leaf is unhealthy what type of disease it is going through. Literature Survey III.I.Title Aerial Visual Perception in Smart Farming: Field Study of Wheat Yellow Rust Monitoring. III.I.I.Description The goal of this study is to use airborne visual perception to detect yellow rust disease in wheat. By merging UAV remote sensing, multispectral photography, and the U-Net deep learning network, an automated rust disease monitoring system was suggested. The traditional random forest (RF) technique is used to compare deep learning algorithms to spectral-based classifiers. Under a variety of network inputs, a field study was conducted to generate an open-access dataset that was used to
296
IoT, Cloud and Data Science
compare the developed framework to standard spectral-based categorization utilizing the RF technique. The solely spectral categorization of Random Forest surpasses the deep learning-based segmentation of U-Net, which draws both spectral and spatial data. III.II Title Android Application of Wheat Leaf Disease Detection and Prevention using Machine Learning. III.II.I.Description In this paper, an android application is designed for an intelligent wheat leaf diagnosis system based on machine learning technique. The system uses an android mobile handset or digital camera to collect wheat leaf disease images and then train the dataset on the algorithm which is going to be used. Image background is simplified by pre-processing the image automatically and adopts the proposed method step by step . Clustering is employed there for the segmentation of the image. The categorization is done using a supervised machine learning method called SVM. It is a data-driven machine learning tool for classification and regression procedures. Picture classification classifies knowledge by looking at the numerical features of several image possibilities. Finally, it delivers the results of the identification to the Android phone. III.III.Title Wheat leaf disease detection using SVM classifier. III.III.I.Description The SVM Classifier method, which uses digital image processing techniques to identify, quantify, and classify plant diseases in visible spectrum digital images, is examined in this paper. It enumerates the numerous methods for detecting wheat illness. For feature extraction, the Principal Component Analysis algorithm is used. For picture segmentation, clustering is used, and for disease classification, the SVM algorithm is used. To create the training set, a collection of roughly 25 photos is used as input. Existing System The automated detection and categorization of wheat leaf diseases based on morphological traits is aided by a variety of classification methods. Using CNN as a classifier, it focuses on detecting wheat leaf diseases. By employing hybrid algorithms, it is also hoped to improve the recognition rate and classification accuracy of the severity of leaf illnesses. Machine learning is used to detect illnesses in wheat leaves by analyzing data from various angles and categorizing it into one of many predefined groups. SVM Classifier is a method for detecting, quantifying, and classifying plant diseases from visible spectrum digital images using digital image processing techniques. Color, intensity, and dimension of plant leaves are taken into account for morphological traits and properties. Proposed System The method we are going to propose is the usage of CNN (Convolution Neural Network ) which is a deep learning algorithm .This neural network has many hidden layers within it. The hidden layers can be having any one of the convolution layers or pooling layers . The input taken from the dataset which are the images , will be passed through these hidden layers of convolution or pooling layer , this is where the images will be trained by extracting the features of the images and it will learn how to classify the wheat leaf. The model will be compared and can be used for prediction purposes .The image's key attributes are based on the wheat leaf's feature and texture. Comparison of the Existing and Proposed Systems The present system focuses on only one wheat leaf disease, that is, yellow rust (aka stripe rust). Also it is not deployed in real time for the farmers to use it. In the proposed system, the wheat leaf will be classified into 3 categories, namely healthy, septoria and stripe rust. It is deployed in real time using
Advances in Science and Technology Vol. 124
297
which the farmer can just upload a picture and get to know if the wheat leaf is infected or not, if the wheat leaf is infected it will classify it into septoria or stripe rust. A manual CNN is also built and its performance is checked with that of AlexNet and LeNet. The previous algorithm uses the U-Net deep learning network and it has a high GPU whereas the manual net we are using can be run in an i3 processor with minimum hard disk of 8GB and it shows a result with an accuracy of 92% and above where one can use it easily. System Architecture
Fig. 1: The architecture diagram Modules ● ● ● ● ●
Import The Image From Dataset Manual Net Alexnet Lenet Deploy
1. Import The Image From Dataset In Module 1 we will be analyzing and processing the images. We will be having 3 types of wheat leaf images, that is, healthy, stripe rust and septoria. To pre process the image we will be uploading the dataset in the colab and then import the necessary packages required to run the architectures and count the accuracy and loss. To find out the features of the leaf we will be looking into the width and height of the each leaf disease image and then find out the minimum and maximum value of them. Image_details function is created to calculate the maximum and minimum width and height. The Images_details_Print_data is created to print the above data. The plot_images function is used to display few images from the dataset. 2. Manual Net In Module 2 we have designed a Convolution Neural Network with 1 convolution layer,1 pooling layer, 1 fully connected layer and softmax layer. The activation function used is Relu. As we progress through the process, we'll resize the image and use the zoom in values, shear range, and horizontal flip to change all of the photos in the same manner. The dataset is run for 10 epochs and the accuracy, loss is calculated. The final accuracy achieved is 0.9402 with training data and 0.9688 with testing data. The final loss quotient achieved is 0.1244 with training data and 0.776 with testing data. The accuracy and loss per epoch is also plotted in a graph for better understanding.
298
IoT, Cloud and Data Science
Fig. 2: The manual net architecture's precision and loss 3. AlexNet We employed the AlexNet architecture in this module, which consists of five convolutional layers, three max-pooling layers, two normalization layers, three fully connected layers, and one softmax layer. The convolutional layer consists of convolutional filters and a nonlinear activation function ReLU. Max pooling is done using the pooling layers. The dataset is re-scaled, sheer range is set and the images is zoomed in and data augmentation is performed. The dataset is run for 20 epochs and the accuracy, loss is calculated. The final accuracy achieved is 0.5936 with training data and 0.625 with testing data. The final loss quotient achieved is 0.7680 with training data and 0.8633 with testing data. The accuracy and loss per epoch is also plotted in a graph for better understanding.
Fig. 3: Alexnet architecture's accuracy and loss 4. Lenet We employed the LeNet architecture in this module, which consists of two convolutional layers, two max pooling layers, two fully connected layers, and one softmax layer. Convolutional filters and a nonlinear activation function ReLU make up the convolutional layer. The pooling layers are used to perform max pooling. The dataset is re-scaled, sheer range is set and the images is zoomed in and data augmentation is performed. The dataset is run for 10 epochs and the accuracy, loss is calculated. The final accuracy achieved is 0.90 with training data and 0.92 with testing data. The final loss quotient achieved is 0.2555 with training data and 0.1872 with testing data. The accuracy and loss per epoch is also plotted in a graph for better understanding.
Advances in Science and Technology Vol. 124
299
Fig. 4: The accuracy and loss of Lenet architecture 5. Deployment Finally, a website is created using the Django framework of Python. The website takes a wheat leaf image as an input and tells if the wheat leaf is infected with septoria or stripe rust or if the wheat leaf is healthy.
Fig. 5: First page of the deployment
Fig. 5.1 An example of uploading a wheat leaf image
Fig. 5.2 Result of the uploaded wheat leaf imag Result In this project we have built a manual network which gives an accuracy of 92% and more. We have also deployed the model in real time using the Django framework. So if the farmer uploads the image of a wheat leaf it will classify it into 3 categories, that is, healthy, septoria and stripe rust.
300
IoT, Cloud and Data Science
Conclusion The goal of this project is to use deep learning techniques to classify Wheat leaf Disease Classification over leaf photos. This is a difficult problem that has been tackled previously using a variety of methods. While feature engineering has yielded positive results, this study concentrated on feature learning, which is one of deep learning's promises. Without the use of feature engineering, image preprocessing enhances classification accuracy. As a result, there is less noise in the input data. Agriculture-based AI wheat leaf disease is in high demand these days. Because of a key constraint, the solution based only on feature learning does not appear to be close. As a result, deep learning approaches could be used to classify leaf diseases. References [1] J. Su et al., "Aerial Visual Perception in Smart Farming: Field Study of Wheat Yellow Rust Monitoring," in IEEE Transactions on Industrial Informatics, vol. 17, no. 3, pp. 2242-2249, March2021 [2]Sumit Nema, Bharat Mishra and Mamta Lambert ,”Android Application of Wheat Leaf Disease Detection and Prevention using Machine Learning”, International Journal of Trend in Research and Development, Volume 7(2), Apr-2020 [3]Er.Varinderjit Kaur, Dr.Ashish Oberoi,” Wheat diesease detection using SVM classifier”, JETIR August 2018, Volume 5, Issue 8, August-2018 [4] Amina Khatra,”Yellow Rust Extraction in Wheat Crop based on Color Segmentation Techniques”, IOSR Journal of Engineering, Vol. 3, Issue 12, December-2013 [5] Ashwini T Sapkal, Uday V Kulkarni, “Comparative study of Leaf Disease Diagnosis system using Texture features and Deep Learning Features” , International Journal of Applied Engineering Research, Volume 13, Number 19 , 2018 [6]Simranjeet Kaur, Geetanjali Babbar, Navneet Sandhu, Dr. Gagan Jindal, “Various plant disease detection using image processing methods”, IJSDR ,Volume 4, Issue 6, June-2019. [7]Mrinal Kumar, Tathagata Hazra , Dr. Sanjaya Shankar Tripathy,” Wheat Leaf Disease Detection Using Image Processing”, IJLTEMAS, Volume VI, Issue IV, April 2017 [8] Aarju Dixit , Sumit Nema,“Wheat Leaf Disease Detection Using Machine Learning Method- A Review”,IJCSMC, Vol. 7, Issue.5, May 2018 [9]M. S. Hossain, M. Al-Hammadi, and G. Muhammad, “Automatic fruit classification using deep learning for industrial applications,” IEEE Trans. Ind. Informat., vol. 15, no. 2, Feb. 2019 [10]Bashish Dheeb Al Bashish, Malik Braik and Sulieman Bani-Ahmad, 2011. “Detection and Classification of Leaf Diseases using K-means-based Segmentation and Neural networks based Classification”. Information Technology Journal, 10: 267-275. [11] E. Hamuda, M. Glavin, and E. Jones, “A survey of image processing techniques for plant extraction and segmentation in the field,” Comput. Electron. Agriculture, vol. 125, pp. 184–199, 2016. [12] P. Kumsawat, A. Menukaewjinda, K. Attakitmongcol and ASrikaew,“Grape leaf disease detection from color imagery using hybrid intelligent system”, ECTI-CON 2008. [13]Dr.K.Thangadurai and K.Padmavathi, “Computer Vision image Enhancement for Plant Leaves Disease Detection”, WCCCT 2014. [14] Monica Jhuria, Rushikesh Borse and Ashwani Kumar,“Image Processing For Smart Farming: Detection Of Disease And Fruit Grading”, IEEE 2013
Advances in Science and Technology Vol. 124
301
[15]Zhang Jian and Zhang Wei,“Support vector machine for recognition of cucumber leaf diseases”, IEEE 2010 [16] A. Srobárová, L. Kakalíková, “Fungal disease of grapevines,”. Eur. J. Sci. Biotechnol, vol. 1, pp. 84–90, 2007.fu [17] S. Zhang, X. Wu, Z. You, L. Zhang, Leaf image based cucumber disease recognition using sparse representation classification,” Comput. Electron. Agric. vol. 134, pp. 135-141, 2017 [18] S. Fiel, R. Sablatnig, “Automated identification of tree species from images of the bark,” leaves and needles. In: Proc. of 16th Computer Vision Winter Workshop, Mitterberg, Austria, pp. 1–6, 2011. [19] Ashwini Awate, Gayatri Amrutkar, Damini Deshmankar and Utkarsh Bagul, “Fruit disease detection using color, texture analysis and ANN,” IGCIoT 2015. [20] Zulkifli Bin Husin, Ali Yeon Bin MdShakaff, Abdul Hallis Bin Abdul Aziz and Mohamed Farook,“Feasibility Study on Plant Chili Disease Detection Using Image Processing Techniques”, ISMS 2012.
CHAPTER 2: Computational Linguistics
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 305-311 doi:10.4028/p-ddx7i2 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-01 Accepted: 2022-09-16 Online: 2023-02-27
Sentiment Analysis on Food-Reviews Dataset Mrs.R.DEEPAa*, Gundu Jitendra Satya Sai Guptab, Fardeen Khanc, Harsh Singhd *Assistant Professor (O.G) Department of Computer Science and Engineering SRM Institute of Science and Technology Chennai, India [email protected], [email protected], [email protected], d [email protected]
a
Keywords: Sentiment analysis, Logistic regression, MultinomialNB, Stopword, Wordcloud
Abstract: There has been a vast development in the field of the internet, it has led people to express their opinions. It is very important to understand the customer for the successful businesses. Customers express their thoughts in the form of reviews, which can be positive or negative. For successful businesses, it is essential for them to understand a customer and their behavior because it will help them to grow their business more successfully. In this paper, we have proposed sentiment analysis of restaurant review datasets using multinomial naive bayes and logistic regression. This program will help owners quickly determine the customer's sentiments. I.
Introduction
Many customers express their opinions in a form of reviews. There is much importance of customer satisfaction because it gives the insights about the particular company product. Reviews can be of two type it can be positive it can be negative. Reviews are the major factor which can decide the growth of a product or a service. There are various criteria on which reviews are based on such as ambience, food, hygiene and management. Review data set can be given to the trained model which will classify the review as positive or negative [11][12][13]. Technique which we have used for approach is Multinomial Naïve Bayes and Logistic regression which will help us to classify the type of review and it will help the owner to bring out the changes in his or her restaurant. The main objective of our research is to classify the reviews on their extent of sentiment. II.
Literature Survey
Md. Toufique, Jinat Ara1, Abdullah, and Hanif Bhuiyan [3] have carried out an experiment to understand the customers' sentiments, and they have used NLP based opinion mining. First, they gathered opinions from the restaurant web portal this process is called opinion gathering. The next step in their process is called data collection and during preprocessing they have removed linking words and punctuation. Moving ahead, they have extracted the words that carry information about sentiment. They have used a Sentistrength lexicon classifier to calculate the strength of words. It gives an idea about the extent of sentiment. The research carried out on 700 opinions it suggested that they should take 0.75 as the SD. On the proposed method, there were 35 customers, of which 20 had a positive opinion, five had a neutral opinion, and 10 customers had a negative opinion. When the research was carried out with the manual observation method, the accuracy was around 75% and when the proposed method was used, the accuracy came to around 85%. Hence, it proves that the manual observation was less accurate as compared to the proposed method. The Rachmawan, Adi Laksono, Kelly Rossa Sungkono, and Riyanarto Sarno [1] paper provided the difference between the accuracy of two algorithms, such as Naive Bayes classifiers and TextBlob Sentiment Classifiers. They have collected 337 data out of which they used 269 data for training and 68 for testing. under the process of preprocessing data has undergone through lowercase
306
IoT, Cloud and Data Science
removal, stopword removal, and punctuation removal. After this process, comes the process of classification where the two algorithms are implemented. When preprocessing is done with Naive Bayes algorithm, they got 63 true negative reviews and 174 true positive. When 68 testing data have undergone Naive Bayes and Textblob result shows that the textblob has a higher false positive than the naive bayes.As a result, the naive bayes method outperforms the textblob method in terms of accuracy.There is a slight difference of 2.9 % in the accuracy. They have presented the confusion matrix of both processes of the 68 testing data. As a conclusion, Naive Bayes outperforms textblob analysis. Nivet Chirawichitchai[9] carried out research on the hybrid method of greedy search. The experiment consists of three steps where she collects data from the IMDB database, and after that, there is a step of preprocessing in which the data goes through feature selection, stopword & stemming, and weighting. The next step is about learning where she applied greedysearch and Multinomomial naive bayes. Feature selection techniques were applied by her to reduce the the number of words. From the experiment result it depicts that the hybrid methods of greedy search and multinomial naive bayes were the most accurate, with an accuracy of 85.0%. Kanwal, Narmean, and Bawany [2] carried out sentiment analysis on the data from the social eatery community. In their research, they collected data from over 4000 records and used NLP in conjunction with various machine learning algorithms such as Logistic Regression, Naive Bayes, Random Forest, and SVM. They have discussed how the natural language tool kit was used to implement NLP. After the collection of data, the second step is data preprocessing, in which it goes through lexical analysis, synthetic analysis, and semantic analysis. Once this process is completed, the next step is to split the data set into training and testing. There were various distributions for training and testing, such as a ratio of 80:20, 75:25, and 70:30. Once segregation is done, machine learning algorithms are applied. In the segregation ratio of 75:25, random forest has the highest accuracy of 95% and also the highest precision rate of 96%. In conclusion, the maximum accuracy was achieved by random forest, which was above 90%. III. System Design The design is divided into six modules or stages: Data-Preprocessing, EDA, Feature Selection and Feature Engineering, Splitting Dataset, Training and Testing the Model and UI Implementation. Model will be trained on two classification algorithms i.e., Logistic Regression and MultinomialNB and prediction will be done on out test data by using the above-mentioned algorithms. Model will later be integrated with our UI and will provide predictions from backhand. Phase I: Collection and Preprocessing of data. Collection is the process of gathering the required data which will be used later. We have collected a good amount dataset having more than fifty thousand review and will work as basis for our machine learning model. Dataset has adequate information upon which model will be trained. A lot of different datasets are available on internet and the dataset we are using and most structured and labeled and have a lot of distinguished columns which will make the model more robust, accurate and reliable. Machine model is as good as the data it is trained on. So, choosing a good dataset is very essential during the data collection phase. Data Preprocessing is the process of cleaning the data and make the dataset set consistent and error free. Usually, the datasets have a lot of missing and unrealistic values. These irregularities are needed to be dealt with before we can train our model, otherwise model accuracy will be affected. There is a lot way we can deal with missing values either we can fill them using mean and mode of the column based on numerical and categorical cases respectively. We also need to normalize the data and make sure all the units conform to the same standard and there are no different units used. Hence, the dataset needs to be normalized before EDA can be performed on it.
Advances in Science and Technology Vol. 124
307
Phase II: Exploratory Data Analysis Exploratory data analysis is the process of visualizing the data and make decision based on that. EDA helps us to see the data in visual aspect and we can use a lot of graph and charts to help gain more insight in the data. EDA most advantageous feature is in detection of the outlier in our dataset. Outlier are the datapoints in a dataset which do not follow the trends or pattern shown by the rest of the datapoints. So, outliers need to removed to make the model more accurate.
Fig 1. Score of reviews vs count Phase III: Feature Selection and Feature Engineering Feature selection is the process selecting the best attributes in the dataset and Feature engineering is the process of creating a new attribute required for model training. Below are few main reasons for using feature selection and feature engineering: • To make the model simple • To create new highly correlated attributes from one or more existing attributes • To reduce dimensionality of dataset Here, in our system we are also creating a new attribute (i.e., sentiment) in which we categorizing whether the given review is positive or negative based on the value ranging from 1 to 5 in the score column. Phase IV: Splitting the Dataset In this phase, we are splitting the dataset into two parts i.e., Training and Test dataset. Training Dataset is the dataset on which the model is trained and Test dataset is the dataset on which the model will be tested and its accuracy will be validated. The ratio in which dataset will be divided is 80% for the training dataset and 20% for the test dataset randomly.
Fig 2. Train and Test datasets. Phase V: Training and Testing the model In this phase we will train our ML model on two classification algorithms i.e., LogisticRegression() and MultinomialNB(). Before that we need to tokenize the reviews sentences in bag of words for that will use CountVectorizer(), it will convert the sentence in matrix of token counts. Prediction will be made on the target variable present in training set. Target variable is the attribute which our model predicts. After training the mode, next step is to determine the accuracy of the model and we will check it against the target variable. For more realistic real-world scenario we will now check the accuracy of our model against the test dataset.
308
IoT, Cloud and Data Science
Phase VI: UI Implementation For UI implementation we are using Python Tkinter GUI toolkit to develop the UI part of the system and then we will integrate the pickle file of our model in the backhand. Pickle file helps in serializing and deserializing of a python structure object. It allows us to save model in very less time. Our UI will consist of a textbox in which the reviews will be written and at last it will have a check button. On clicking the check button, a dialog box will be displayed showing customer sentiment. UI can be used without closing and any number of times the reviews can be checked. IV. Methodology As mentioned earlier in the article first phase will be data-preprocessing followed by EDA. After EDA new attribute will be created and low-correlated attributes will be removed to make model simpler and reducing the dimensionality. Dataset will be split in two i.e., train and test. Model will be trained on LogisticRegression() and MultinomialNB() classification algorithms. Accuracy will be validated using the test dataset and then Model will be integrated with the UI. Here we will create a new attribute called Sentiment which will have two values either -1 or 1 and it will be our target variable and LogsiticRegression() and MultinomialNB() will predict this column value. Here 1 means positive review and -1 means negative review.
Fig 3. Workflow of the system A. Cleaning the data. In dataset we can check for null values using isNull() function. Since we have some missing or null value, we are dropping them from the dataset. Punctuation will be also removed from the columns such as text and summary. B. Using WordCloud We are importing WordCloud package and using it to get visual representation of words that are used most frequently. It will be further used to see most common positive and negative words in dataset when we will split the data in positive and negative based on the value of sentiment attribute.
Advances in Science and Technology Vol. 124
309
Fig 4. WordCloud to depict most common word in dataset C. Using StopWord() to Process the reviews and text Column StopWord() is used to filter out any word present in the stoplist before or after the natural text processing. We are using stopword to remove the negative reviews containing the positive words good, great etc. We need to filter out these words during the text processing otherwise model will be trained on negative review will consider the good great as a negative word and will make wrong predictions. D. Using countVectorizer() to tokenize the text CountVectorizer() is a package from scikit-learn library in Python. It converts words in bag of text model. It also enables the pre-processing of text data prior to generating the vector representation. We will use it on the summary column and tokenize the word in the column. E. Algorithms used a)LogisticRegression(): It is a classification algorithm used to predict dependent variable based on one or more independent variables. It is one of the most popular machine learning algorithms. It is a supervised machine learning algorithm. Here target column in the sentiment column and logistic regression will predict the value either 1 or 1 i.e., whether the review is positive or negative.
Fig.5 Classification report of LogisticReg()
Fig.6 Accuracy Score of LogisticRegression() b)MultinomialNB(): It is a classification algorithm used for classification of discreate fetaure. It is mainly used in word counts for text classification. It work on the principle of multinomial distribution. Same as the logistic regression traget column for MultinomailNB will be sentiment column and predicted value will be either 1 or -1 i.e., whetehr the review is positive or negative.
310
IoT, Cloud and Data Science
Fig 7. Classification report of MultinomialNB()
Fig.8 Accuracy Score of MultinomialNB() V. Conclusion In present world the internet is becoming more dominat in our lives everyday. Online purchase are becoming ever increasing and reviews have become an essential part to decision making in any buisness. Sentiment analysis help to determine the views of the consumer and helps buisness to make right decisions.This framework make ideal utilization of the logisticRegression() and multinomialNB(). It makes use of the given dataset in most effective way. The stopword package helped us to make model more accurate by filtering out positive keywords from negative reviews and thus making model more robust and accurate. Here the accuracy of our logisticRegression() and multinomialNB() are 93% and 92% respectively.This can be further improved using advanced NLP technique. Some other classification algorithms can also be used to check their accuracy and compare. In terms of future prospects, we hope that target area is not only limited to the specific topic. Target area can be increased and making model more useful. Here our model is mostly focued on food product and their reviews but it should be able to produce the same result if the topics area changes to some other topics. References [1] Rachmawan Adi Laksono, Kelly Rossa Sungkono and Riyanarto Sarno “Sentiment Analysis of Restaurant Customer Reviews on TripAdvisor using Naive Bayes,” J12th International Conference on Information & Communication Technology and System (lCTS) 2019 [2] Kanwal Zahoor, Narmeen Zakaria Bawany and Soomaiya HamidK. Elissa, “Sentiment Analysis and Classification of Restaurant Reviews using Machine Learning,” 21st International Arab Conference on Information Technology (ACIT) ,2020 [3] Jinat Ara1, Md. Toufique Hasan, Abdullah Al Omar, Hanif Bhuiyan.,” Understanding Customer Sentiment: Lexical Analysis of Restaurant Reviews,” IEEE Region 10 Symposium (TENSYMP), 57 June 2020, Dhaka, Bangladesh,2020 [4] M. Nakayama. and Y. Wan, The cultural impact on social commerce: A sentiment analysis on Yelp ethnic restaurant reviews. Information Management, 2019. 56(2): p. 271-279M
Advances in Science and Technology Vol. 124
311
[5] G. Somprasertsri. and P. Lalitrojwong, Mining Feature-Opinion in Online Customer Reviews for Opinion Summarization. J. UCS, 2010. 16(6): p. 938-955. [6] D. Grabner, M. Zanker, G. Fliedl, M. Fuchs, and others, Classification of customer reviews based on sentiment analysis. Citcsccr, 2012 . [7] D. Grabner, M. Zanker, G. Fliedl, M. Fuchs, and others, Classification of customer reviews based on sentiment analysis. Citcsccr, 2012 . [8] U. W. Wijayanto and R. Sarno, "An Experimental Study of Supervised Sentiment Analysis Using Gaussian Naive Bayes," in 2018 International Seminar on Application for Technology of Information and Communication, 2018, pp. 476--481. [9] Nivet Chirawichitchai, “Sentiment Classification by a Hybrid Method of Greedy Search and Multinomial Naïve Bayes Algorithm,” 2013 Eleventh International Conference on ICT and Knowledge Engineering. [10] Ayodeji Olalekan Salau, Shruti Jain, “A Review on Feature Selection and Feature Extraction for Text Classification,” IEEE WiSPNET 2016 conference [11] Pang, B., Lee, L., and Vaithyanathan, S. Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proceedings of Conference on Empirical Methods in Natural Language Processing, 2002. pp.79-86. [12] Pang, B., and Lee, L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 2004. pp. 271-278. [13] Pang, B., and Lee, L. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, Vol 2(1), 2008. pp.1– 135.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 312-319 doi:10.4028/p-tsrdcl © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-01 Accepted: 2022-09-16 Online: 2023-02-27
An Extractive Question Answering System for the Tamil Language Aravind Krishnan1,a, Srinivasa Ramanujan Sriram2,b*, Balaji Vishnu Raj Ganesan3,c, S. Sridhar4,d 1,2,3,4
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani, Chennai, Tamil Nadu, India
[email protected], [email protected], [email protected], [email protected]
a
Keywords: question answering, question answering dataset, tamil question answering, xlm roberta, natural language processing, deep learning
Abstract. In the field of Natural Language Processing, Question Answering is a cardinal task that has garnered a lot of attention. With the development of multiple language models, question answering systems have been developed and deployed to facilitate enhanced information retrieval. These systems, however, have been implemented to a large extent only in English. Our objective was to create such a question answering system for the Tamil Language. We decided to use XLMRoBERTa as our language model, which has been trained on a variety of datasets. We have also employed a hand-annotated dataset for the purpose of validation. We trained the model on two types of datasets, the first one being only in Tamil, whereas the other one being a mixture of Indian languages along with Tamil. The results were satisfactory in both cases. Given the huge amount of computational power the model required for training, we utilized the Colab Pro Plus cloud GPU from Google to satisfy our demands. We will also be publishing our dataset on huggingface so that fellow researchers can use it for further analysis. Introduction Natural Language Processing is a very popular field of study in today’s world. Many impactful and useful applications such as Machine Translation, Chatbots, Text Prediction and Sentiment Analysis are some subset tasks of NLP. Under this umbrella falls Question Answering, which is just as important. Question Answering involves generating or extracting appropriate answers for a query that has been posed to the system. Question Answering Systems have come a long way since its inception. With the introduction of the Stanford Question Answering Dataset (SQuAD) [1], tremendous strides have taken place. Google has implemented Question Answering into its search engine to facilitate enhanced information retrieval, the answers are displayed in a crisp and concise manner, without you having to skim through multiple documents. However, Question Answering is extremely dependent on a good corpus, be it for extracting answers or for training models, it is non-negotiable to have a quality data corpus. Much of this growth has been limited to the English language and a few other languages such as Chinese, German, etc. Niche regional languages have not been the focus of Question Answering systems. Despite having a significant number of native speakers, Indian languages have a diminished presence on the web. This was the driving factor in us pursuing this project, which aims to implement such a Question Answering system for Tamil. There are broadly two types of Question Answering systems – open and close domain. The method of answer retrieval varies under these types. We have taken up Extractive question answering as our point of focus due to practical constraints. Extractive question answering involves extracting the right answer span from a given context, for a corresponding question. Our question answering system has been constructed using XLM-RoBERTa as the language model. In order to train the model, we used a couple of datasets, which include the SQuAD dataset that has been translated into Tamil. For validating our model, we lacked any relevant benchmarks hence, we resorted to creating our own validation dataset with a little over 1000 question-context pairs. This dataset has been created from various Wikipedia articles spanning multiple genres and domains, hence, it could also potentially act as a benchmark for evaluating other Tamil based language models.
Advances in Science and Technology Vol. 124
313
Related Works Stanford Question Answering Dataset (SQuAD) [1] dataset comprises over 1,00,000 questioncontext pairs extracted from top Wikipedia articles. The dataset was split such that 80% were utilized for training and the rest were used for evaluation. These question-context pairs were manually annotated. The dataset also contains questions that are unanswerable. The SQuAD is considered to be one of the popular benchmarks for evaluating question answering systems. This can be ascribed to the humongous amount of high-quality data. In natural language processing, the Transformer [2] is a unique design that seeks to solve sequenceto-sequence tasks while also resolving long-range dependencies. It does not use sequence aligned RNNs or convolution to compute representations of its input and output, instead of relying solely on self-attention. Self-attention [2] is the technique of linking distinct points of the same sequence or sentence in order to achieve a clearer representation contextually. Each embedding of the word has three separate vectors related to it namely Key, Query, and Value. To facilitate the creation of these vectors, matrix multiplication is employed. The target’s query and the input’s key are utilized to create a matching score whenever we need to calculate the attention of a target word in relation to the input embeddings. The matching scores then act as the weights of the value vectors during summation. This is the core working principle of Transformers.
Fig. 1. Multi-head Attention. XLM-RoBERTa (XLM-R) [3] is an optimal choice for multilingual NLP tasks, as XLM-R has been pre-trained in 100 languages. XLM-R was released by the Facebook AI Team. This model has been trained on a huge amount of cleaned common crawl data. For low resource languages, new unlabeled corpora were generated. XLM-R is a variant of RoBERTa, which in turn is a variant of BERT [6]. Like RoBERTa, XLM-R is trained with the Masked Language Modelling as its objective.
314
IoT, Cloud and Data Science
Fig. 2. Part of XLR-R Architecture. In the Masked Language Modelling (MLM) technique we hide a certain percentage of the words in each sequence with a token labelled ‘’. The model then aims to predict the masked tokens by making use of the other available contexts. Masked Language Modelling is one of the most common training objectives when it comes to Language Models, as it allows for a good contextual understanding of the language. Methodology
Fig. 3. A Sample record of the question-context pair from train data. Datasets. We are employing two strategies. Initially, we trained the model strictly on a Tamil dataset. Subsequently, we trained the model on a mixture of Hindi and Tamil. As for the test set, we have used Tamil Wikipedia Articles and created our own manually annotated dataset. The dataset compilation is listed as follows:
Advances in Science and Technology Vol. 124
315
Table 1. Dataset Compilation. Sources
Number of Records Only Tamil
ChAII Tamil a
368
SQuAD Translated b
3567
Mixed Data ChAII Tamil & Hindi a
1114
SQuAD Translated b
3567
MLQA Hindi [4]
5425
XQuAD Hindi c [5]
1190 Test Data
Wikipedia Articles d 1015 a https://www.kaggle.com/competitions/chaii-hindi-and-tamil-question-answering/data b Self Machine Translated c https://github.com/deepmind/xquad d https://huggingface.co/datasets/Srini99/TamilQA K-Folds. For a question answering system to effectively perform, it requires to be trained on at least 25,000 question-answer pairs. As seen in the above tables, there seems to be a clear lack of datasets. To overcome this shortage and to ensure that the model is trained efficiently, the K-fold algorithm is used, specifically Stratified K-Folds. This is the process of splitting the dataset into train and test preserving the multi-class ratio variance almost equal in each of the folds between train and test sets. This way, the prediction of the model won't be biased and would also have been exposed to all the various classes in the dataset. K-1 folds are used for the training purpose and the Kth fold is left out for evaluation on each fold. We performed a 5-fold split which masks 20% of the dataset in each fold while training.
Fig. 4. Question Answering Process. Question Answering Process. When one talks about the extractive question answering process, we are required to find the answer span in one passage. However, there are some caveats to this process. The major one being the large set of potential predictions and the other one being an in-depth contextual understanding of potentially long spans of text. To perform this task, we have to annotate the data with special tokens such as [cls] and [sep]. [cls] stands for classifier token and is used at the beginning of each pair to store the details of the class. [sep] stands for separator token and is used to indicate the separation between question and context.
Fig. 5. Pre-Processing.
316
IoT, Cloud and Data Science
This annotated data is then passed through the tokenizer which converts it into features that comprise the following: • Input ID – Represents the vectorized data. • Attention Mask – Represents the parts of context to focus on. • Offset Mapping – Maps the features back to the original context. • Sequence ID – A binary mapping to denote whether the token is part of the question (0) or context (1). • Start & End index – denotes the start and end index of the answer text
Fig. 6. Hyperparameters. hen the context is too long for the tokenizer to handle, we truncate parts of the context and append it along with the question to generate a new feature. Hence there are multiple features mapped to one example. Stride is nothing but the allowed overlap in the context between features. For each token in a feature, the feed-forward layer outputs a start logit and an end logit. The logits are then normalized using SoftMax function to give the probabilities of the start and end token. The span with the highest probability is then chosen as the answer, provided it is a valid span. Using offset mapping, we trace back to the context from the feature containing the answer and extract the highest scoring span. Evaluation Metrics. When it came to testing our model, there was a dearth of validation datasets for Tamil. In order to address this issue, we had to hand annotate hundreds of datasets that were high in quality. We had to go through a huge number of articles and frame questions that were logically and contextually good, as well as, provide the answer along with the start and end positions of the answer. In doing so, we had created a very good validation dataset that could also potentially be of use to other models, as well as the community. (i). The F1 score is a very important valuation metric, which combines both the recall and the precision of a classifier into one single metric by means of their harmonic mean. It is a very common metric that is being used for evaluating Machine Learning and NLP tasks. The F1 score is calculated as follows: 2 (P * R) / (P + R).
where, P = Precision; R = Recall
(1)
Advances in Science and Technology Vol. 124
317
(ii). Exact match is another metric that we have employed to gauge the performance of our model. This is a very simple metric to understand, this metric matches the output that one gets from the model and compares it to the corresponding answer, if the answer matches, the metric returns a value of 1, else a value of 0 is returned. The formula for Exact Match is as follows: EM = ∑𝑁𝑁 𝑖𝑖=1 𝑓𝑓(𝑥𝑥𝑖𝑖 )⁄𝑁𝑁 .
where, EM takes value 0 or 1
(2)
Result. We evaluated the model on our own dataset, which has 1015 records of Tamil questioncontext pairs. For the purely Tamil based dataset, we had a value of 75.763 for the exact match metric, whereas for the F1 score, we arrived at a value of 83.948. For the dataset that has been trained on both Hindi and Tamil, we obtained an exact match score of 77.536 whereas, for the F1 score, we obtained a value of 85.665. We noted that the mixed dataset performs better than the purely Tamil based dataset by a small measure. Table 2. Results. Evaluation Metrics
Purely Tamil Dataset
Mixed Language Dataset
F1 Score
83.948
85.665
Exact Match
75.763
77.536
As one can infer from the above table, the value of F1 score is relatively higher than the exact match value. The F1 score metric is a less constrained metric than the exact match metric. F1 score simply estimates the potential overlap between the actual answer and the predicted answer given by the model. Analysis. During our first run of the project, we used datasets that contained solely Tamil records, which had both the questions and the corresponding answers in Tamil only. This was due to the preconception that the model will perform better on Tamil Question Answering if it has seen only Tamil during its training stage. During the subsequent runs of the project, we decided to test out training the model on an additional Indian language that is predominantly spoken, Hindi. Our model performed quite a bit better when it had been trained on multiple languages rather than being trained solely on Tamil. We speculate that this could be ascribed to the fact that Dravidian Languages are similar and hence, the model gained more clarity when training this way. Hardware Specifications. The training and testing of the model was carried out using Google Colab Pro Plus, with Tesla P100-PCIE-16gb GPU. We also utilized Cuda Library to parallelize and speed up the entire process. Conclusion Question Answering is an extremely important field in Natural Language Processing. Whilst there is an abundance of resources when it comes to English Question Answering, there is an extreme dearth when it comes to regional languages, especially Tamil. By publishing our model to the Huggingface library as a community model, we hope that our model will be the seed that sprouts into a viridescent tree for Tamil Question Answering. With a larger and quality dataset, the project can be broadened to cater to more multi-lingual natural language processing tasks and can be expanded to train a model to perform critical and logical reasoning tasks rather than just simple comprehension.
318
IoT, Cloud and Data Science
Acknowledgement We thank Huggingface Library and the community behind it for their magnanimous contribution towards NLP and its development. We also would like to extend our gratitude towards ChAII – Challenges for AI in India Kaggle contest for providing us with the rudimentary dataset for training. References [1] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang, “SQuAD: 100,000+ Questions for Machine Comprehension of Text”, arXiv:1606.05250, [cs], June 2016. [2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, “Attention Is All You Need”, arXiv:1706.03762, [cs], June 2017. [3] Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov, “Unsupervised Cross-lingual Representation Learning at Scale,” arXiv:1911.02116, [cs], Nov. 2019. [4] Patrick Lewis, Barlas Oğuz, Ruty Rinott, Sebastian Riedel, Holger Schwenk, “MLQA: Evaluating Cross-lingual Extractive Question Answering,” arXiv:1910.07475 [cs], Oct. 2018. [5] Mikel Artetxe, Sebastian Ruder, Dani Yogatama, “On the cross-lingual transferability of monolingual representations” arXiv:1910.11856 [cs], Oct. 2019. [6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv:1810.04805 [cs], Oct. 2018. [7] L. -Q. Cai, M. Wei, S. -T. Zhou and X. Yan, "Intelligent Question Answering in Restricted Domains Using Deep Learning and Question Pair Matching," in IEEE Access, vol. 8, pp. 3292232934, 2020, doi: 10.1109/ACCESS.2020.2973728. [8] Y. Lan, S. Wang and J. Jiang, "Knowledge Base Question Answering With a MatchingAggregation Model and Question-Specific Contextual Relations," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 10, pp. 1629-1638, Oct. 2019, doi: 10.1109/TASLP.2019.2926125. [9] L. Su, T. He, Z. Fan, Y. Zhang and M. Guizani, "Answer Acquisition for Knowledge Base Question Answering Systems Based on Dynamic Memory Network," in IEEE Access, vol. 7, pp. 161329-161339, 2019, doi: 10.1109/ACCESS.2019.2949993. [10] W. Wu, Y. Deng, Y. Liang and K. Lei, "Answer Category-Aware Answer Selection for Question Answering," in IEEE Access, vol. 9, pp. 126357-126365, 2021, doi: 10.1109/ACCESS.2020.3034920. [11] D. R. CH and S. K. Saha, "Automatic Multiple Choice Question Generation from Text: A Survey," in IEEE Transactions on Learning Technologies, vol. 13, no. 1, pp. 14-25, 1 Jan.-March 2020, doi: 10.1109/TLT.2018.2889100. [12] R. -Z. Wang, Z. -H. Ling and Y. Hu, "Knowledge Base Question Answering with Attentive Pooling for Question Representation," in IEEE Access, vol. 7, pp. 46773-46784, 2019, doi: 10.1109/ACCESS.2019.2909826. [13] M. Wei and Y. Zhang, "Natural Answer Generation with Attention Over Instances," in IEEE Access, vol. 7, pp. 61008-61017, 2019, doi: 10.1109/ACCESS.2019.2904337.
Advances in Science and Technology Vol. 124
319
[14] Y. Sun et al., "Joint Learning of Question Answering and Question Generation," in IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 5, pp. 971-982, 1 May 2020, doi: 10.1109/TKDE.2019.2897773. [15] T. Shao, Y. Guo, H. Chen and Z. Hao, "Transformer-Based Neural Network for Answer Selection in Question Answering," in IEEE Access, vol. 7, pp. 26146-26156, 2019, doi: 10.1109/ACCESS.2019.2900753.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 320-329 doi:10.4028/p-0892re © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-29 Accepted: 2022-09-16 Online: 2023-02-27
Deep Learning-Based Speech Emotion Recognition Arryan Sinhaa*, Dr. G. Suseelab Student, Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology Tamil Nadu, India Assistant Professor, Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology Tamil Nadu, India [email protected], [email protected]
a
Keywords: Machine Learning, Deep Learning, Neural Network, MLP Classifier, RAVDEES Dataset etc.
Abstract: Speech Emotion Recognition, as described in this study, uses Neural Networks to classify the emotions expressed in each speech (SER). It’s centered upon concept where voice tone and pitch frequently reflect underlying emotion. Speech Emotion Recognition aids in the classification of elicited emotions. The MLP-Classifier is a tool for classifying emotions in a circumstance. As wave signal, allowing for flexible learning rate selection. RAVDESS (Ryerson Audio-Visual Dataset Emotional Speech and Song Database data) will be used. To extract the characteristics from particular audio input, Contrast, MFCC, Mel Spectrograph Frequency, & Chroma are some of factors that may be employed. To facilitate extraction of features from audio script, dataset will be labelled using decimal encoding. Utilizing input audio sample, precision was found to be 80.28%. Additional testing confirmed this result. I.
Introduction
Human emotions are easy for us to understand as ordinary people, but they are challenging for machines to comprehend. In order to educate computer to identify emotions, we employ machine learning techniques. Making choices or projections without programming utilising model built from training data. Speech emotion recognition is prominent topic in area of computer science. [1]. Emotion is a means of expressing oneself. how a person feels and their mental state Emotions are vital in delicate jobs such as of surgeon, commandant in military, as well as others wherever individual must keep one's emotions under control [1]. Predicting emotions is difficult because everyone speaks in a different tone and intonation. Happy, furious, indifferent, sad, and shocked are some of the feelings that are elicited. The purpose of this research is for classifying such sentiments by provided sample speech using best applicable approach. II Literature Review Previous research on speech emotion recognition has used a variety of approaches and datasets. For example, for Voice Emotion Recognition utilizing CNN and Decision Tree approach where features have been done extraction by MFCC by pre-processed audio file, few of with utilise several types of neural numerous kinds of classifiers an networks [1]. Two classifiers, CNN and Decision Tree, are used to apply the retrieved features.
Emotion Recognition using CNN
2D representation is a common way for evaluating speech and audio signals. In audio processing, time-frequency investigation is extensively utilized. After pre-processing, we use the Short Time Fourier Transform (STFT) for converting voice signal to a 2D representation. The 2D representation is then examined using CNNs and Long Short-term Memory (LSTM) architectures[2]. Deep learning entails use of hierarchical representations with ever higher levels of abstraction. The
Advances in Science and Technology Vol. 124
321
findings corresponding to each selected audio frame are categorized using a sum of probabilities after traversing sequentially created networks. For a 2D representation of a voice signal with a frame size of 256 and 50% overlap, we employed STFT. In addition, the number of time steps in the look-ahead frame was 128. As a result, the input image was 128 × 128 pixels in size. Then, for learning representation, two convolution and max-pooling layers were stacked. Finally, all values were flattened, and 1024 nodes were used to connect them together [2]. •
Emotion Recognition using DNN -Decision Tree and SVM
A DNN-decision tree SVM model is used to offer a speech emotion recognition approach. Emotional information may be extracted from speech signal in variety of ways, but proposed technique can also identify more distinct emotional aspects from the the often-confused feelings. Emotion confusion was assessed initially, and then distinct DNN networks were trained for different groups of emotions in order to retrieve the bottleneck features utilized in training every SVM in decision tree [3]. Using this approach, we can finally classify emotions expressed in speech. Emotion recognition is 6.25 % as well as 2.91 % better using proposed technique than with classic SVM and DNNSVM classifying techniques, respectively, according to the findings of experiment [3]. This strategy has been shown to effectively eliminate the confusion between emotions, resulting in a higher rate of speech emotion identification. III Proposed Methodology The tone and pitch of our voice reflect the underlying emotion in our words. We hope to classify elicited emotions such as sad joyful, neutral, furious, disgusted, astonished, afraid, and tranquility into our article. The emotions in speech are predicted using neural networks in this article [1]. The MLP Classifier (Multi-Layer Perceptron Classifier) is used to classify emotions. The data utilized into our paper is RAVDESS (Ryerson Audio-Visual Database of Emotional Song & Speech dataset)[4]. •
Database Description
RAVDESS collection includes 24 recordings of 24 artists, 12 female and 12 males, their number span from 01 to 24. " Compared to female performers, male actors are given odd numbers. The dataset includes sad, glad, neutral, angry, disgusted, horrified, afraid, and calm expressions [1]. Solely audio, audio-video, and only video expressions are accessible in the dataset. This classifier is constructed entirely on audio data in order to be able to discern between different types of emotions while speaking. All 24 performers give two prerecorded utterances, which are then replayed twice for every one of 8 emotions. All emotional expressions possess 2 intensity levels: normal and strong, except for the "neutral" sensation, which is only exhibited in normal intensity. Since there are 60 trials for every one of 24 actors, RAVDESS component that we utilize includes 1440 files. According to decimal encoding, data has been named. Each file is given a unique identifier, such as a filename. Using a seven-part numerical identifier, filename includes 3rd number component, which indicates an emotion. following is a list of emotions with their corresponding labels: 01-'neutral,' 02- 'calm,' 03-'happy,' 04-'sad,' 05-'angry,' 06-'frightful,' 07- 'disgust,' 08-'surprised.'
322
IoT, Cloud and Data Science
Fig.1, RAVDEES Dataset of 24 ma le and female Actors •
Neural Network and Multi-Layered Perceptron classifier
A perceptron-based network called an MLP is a multi-layer perceptron. At its core, a hidden layer resides between the two layers which receive and process input signal, making it possible to forecast or make judgements on basis of input signal. In theory, number of concealed layers may be as high as desired, and this number could be changed at will. Proposed Speech Emotion Recognition Methodology includes Multi-Layer Perceptron Network with one input layer and three hidden layers of (300,) (40,80,40) The audio file's five features will be delivered to input layer. In theory, number of concealed layers may be as high as desired, and this number could be changed at will. Proposed Speech Emotion Recognition Methodology includes Multi-Layer Perceptron Network with one input layer and three hidden layers of (300,) (40,80,40) The audio file's five features will be delivered to input layer.
Fig2, Graph representation of Multilayer perceptron of connecting the nodes The processes for creating the MLP Classifier are as follows. o o o o
•
Set up the MLP Classifier by defining and launching the necessary parameters. In order to train neural network, data is sent into it. Trained network predicts result. Make sure your projections are on right track. Librosa
An audio and music analysis tool written in Python called Librosa. Music composition (using LSMs) & voice recognition (using LSMs) are most common uses of Librosa. Music information retrieval systems may be built on top of it since it has essential building blocks. Audio signals may be visualized using Librosa, and different signal processing methods can be used to extract features. [5].
Advances in Science and Technology Vol. 124
323
Feature Extraction The tone and pitch of a person's voice frequently indicate hidden emotions. The purpose of feature extraction is to extract relevant features such as feelings from discourse signals. Discourse signals are presented as information, and features are retrieved from them. The features are, MFCC, Mel’s Spectrograph Frequency, Chroma. MFCC There will be a lot of noise in the audio stream, we can't utilize it as an input to our model. It has been shown how feature extraction from audio signal and using these like an input to basic model yields much better results than using raw audio signal like an input. When it comes to features extracted out of an audio source, MFCC is technique of choice. As shown in Figure 3 o
Fig.3, Workflow diagram of MFCC
A/D Conversion:
With an 8kHz or 16kHz sampling frequency, we'll convert our analogue audio signal to digital format.
Preemphasis:
The magnitude of energy in the higher frequency is increased via preemphasis. Voiced segments like vowels have lesser power at high frequency than at lower frequencies when we analyze frequency domains of an audio signal with voiced segments like vowels.
Windowing:
Goal of the MFCC technique is to extract features from an audio stream that may be used to detect phones in speech. However, because there will be multiple phones in the given audio signal, we will divide it into separate segments, each with a 25ms width and a signal spaced 10ms apart. A person can pronounce three words each second with four phones and three states each, result into 36 states/sec, or 28 milliseconds/state, that is closer to our 25- millisecond window. From each segment, we'll extract 39 characteristics.
DFT ( Discrete Fourier Transform):
When signal has to be transformed from time domain to frequency domain, dft transform will be employed. For audio signals, frequency domain analysis is more straightforward than time domain analysis.
Mel-Filter Bank:
As a result, we'll utilize the mel scale to convert the actual frequency to a frequency that humans can understand. The mapping formula is shown in Fig.4 below.
324
IoT, Cloud and Data Science
(1) Eqn (1) it is used for transformation from hertz scale to mel scale. Here ln is a natural logarithm. I f the log is base 10 then equation coefficient (1127) will alter sightly.
Applying Log:
At higher energy levels, humans are less sensitive to changes in auditory signal energy than at lower energy levels. The log function has a common attribute in that the x gradient of the log function is higher at low input values and lower at high input values. To simulate the human hearing system, we apply log to the Mel-filter output.
IDTF:
To convert signal from one frequency domain to another, Fourier transform uses time-domain sampling (every sample into signal have time correlated with it). For example, frequency domain depiction is only a different format for same signal as time domain representation. inverse Fourier transform is used to remap signal from frequency domain to temporal domain. Fourier transform's complete invertibility is key quality to create filters that remove noise or specific components of a signal's spectrum. The Mfcc spectrogram is shown in Figure 4.
Fig. 4, Time Spectrogram of MFCC of the audio sample o
MEL
Humans do not hear frequency into an linear scale, according to studies. The lower the frequency, the simpler it is for humans to detect. Although distance among 2 pairs is equal, we can tell 500 and 1000 Hz apart easily, but 10,000 and 10,500 Hz are almost indistinguishable, even if the distance between them is same. Three composers in 1937 developed a pitch unit which made equivalent pitches sound equivalent distance Away to the listener. There's a phrase for this: the mel scale. In order to create mel spectrum, frequencies are translated to mel scale and plotted. Mel spectrogram of audio is shown in Figure 5.
Fig. 5, Spectrogram of Mel of the audio sample o
Chromagram
We may grasp the meaning of chromogram into audio files context because the word chromatography referring to separating of separate components by combination. An audio file can have up to 12 separate pitch classes in audio file analysis. For analyzing audio files, these pitch class profiles are
Advances in Science and Technology Vol. 124
325
quite valuable[6]. The term chromagram refers to putting all of the pitches in an audio recording in one location so that we can understand how to classify them. Sound and signal features such as pitch enable frequency-related scale are being utilized to sort files. It is a measurement of sound quality that aids in assessing sound as higher, lower, or medium Chromatograms exist in 3 variations.:
Waveform chromagram
It's just a chromagram built from power spectrogram of audio files. Sound waveforms are broken down into several pitch classes.
Constant-Q chromagram
This is an chromagram of a constant-Q transform signal. The Fourier Convert as well as Morlet Wavelet Transform are used to transform signal in frequency domain.
Chroma energy normalized statistics (CENS) chromagram
Since this chromagram is dependent upon short-term statistics of energy distributing inside chroma bands to produce CENS (Chroma energy normalized statistics), it's named after fact that it's composed of signals in their energy form. [6]. Figure 6 shows the chromagram
Fig. 6, Spectrogram of Chroma of the audio sample IV Implementation Based on the input, MLP classifier is utilized to make predictions about emotions. Using the information gathered, we come up with a conclusion. The model is equipped with a wide range of options. Since just one feature parameter may be used to make an effective prediction, we obtain wide range of predicted emotion when we employ characteristics individually as well as pass it together all. We fed MLP Classifier 70:30 of Ravdess datasets to train the model [1]. The collection includes 24 North American accented audio samples from professional performers. In all, eight distinct emotional states are characterized. With time series-based data, like audio we'll be utilizing to forecast emotion, the Classifier is most suited. [1]. Model’s workflow is depicted in Figure 7.
326
IoT, Cloud and Data Science
Fig. 7, Step by step process of block diagram of the system classification of speech emotion recognition •
Data Preprocessing
In this process data is Imported, Cleansed, Visualization. Import of library packages with dataset. Variable identification by shape, type, and missing/duplicate values will be examined in this study. [7].
Fig. 8, Importing the header files
Fig. 9, Dataset representation before pre-processing •
Data Visualization
It’s vital talent that helps us develop a deeper grasp of the world around us. This is useful in gaining an understanding of dataset and finding the patterns of faulty data, outliers, and anomalies. Key Relationships may be expressed and shown using bar graphs, pie charts, and other data visualizations. As shown in Figure 10
Fig. 10, Bar graph representation of 8 emotion
Advances in Science and Technology Vol. 124
•
327
Augment the data sample
The term "data augmentation" refers to process of generating new synthetic training instances by making small alterations to our previous training set. Because of these perturbations, our model's ability to generalize has to be improved [8]. In order for this to work, the perturbations that are added must keep same label as original training set. We'll use noise, stretch, and pitch in our scenario. As shown in Figure 11,12, 13
Fig. 11, Pitch of the audio sample
Fig.12, Stretch of the audio sample
Fig.13, Noise of the Audio sample •
Extract features
In the realm of audio signal processing, the extraction of audio features is critical step. Sound processing [9] refers to modification or processing of audio signals.[9] Analog signals are converted into digital ones, it reduces unwanted noise and balances the temporal frequency ranges. We are concentrating on mfcc, mel, and chroma as our main characteristics. As shown in Figure 14
Fig.14, Extracting the main features of the audio •
mainly mfcc , chroma and mel
Modeling the data
We load the dataset in train test split in the ratio of 70:30 and perform the MLP Classifier algorithm on it.
328
IoT, Cloud and Data Science
Fig.15, Loading the Dataset the RAVDEES Dataset in MLP Algorithm V. Result Machine learning was applied to successfully recognize a human voice's expressed emotion.76.77 % accuracy was achieved by using dataset training and testing. When more datasets are used, accuracy improves as well. After predicting the accuracy, we have done a test run to check the model by creating a function and putting values into it is seen that predicted out is correct. Based on the preceding, voice emotion recognition by use of MLP is both effective & simple to implement. Figure 16 shows the classification report of model and Figure 17 shows the test conduct on the model.
Fig. 16, Classification Report of MLP Classifier
Fig.17, Testing the MLP Classifier Model passing the audio file in it VI. Conclusion The overall system results through the model we can obtain the ultimate emotions from audio data with few intuitions upon. Emotion of human expression by voice. It has been found that MLP classifier algorithm will be optimal for speech emotion recognition [10]. These systems can be used in a variety of settings, including call centers for customer service or marketing, voice-based virtual assistants or chatbots, linguistic research, and so on. VII. Future Scope We plan to test various designs on this pooled dataset in the future to increase the accuracy of our model. There are a number of ways in which our suggested model can only identify emotions based on speech input; we want to test it with video images as well as voice input in order to see whether it can be used in systems wherein emotion detection is necessary for acting or reacting based on that person's feelings. Eventually, we'd want to experiment with several emotions from single input file since people may display a wide range of emotions in real time. Only one emotion from input voice stream can be detected by our model right now.
Advances in Science and Technology Vol. 124
329
VIII. Acknowledgement We are grateful to Mr. G. Suseela, Assistant Professor, School of Computing, SRM Institute of Science and Technology, Chennai, India for his constant support and guidance in the making of this project and overcoming the doubts and problems we faced while creating this project. References [1]
[2] [3] [4] [5] [6] [7] [8] [9]
[10] [11] [12] [13] [14]
Nagaraja N Poojary, Dr. Shivakumar G S, Akshath Kumar B.H, "Speech Emotion Recognition Using MLP Classifier", International Journalof Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456- 3307, Volume 7, Issue 4, pp.218-222, July-August2021. W. Lim, D. Jang and T. Lee, "Speech emotion recognition using convolutional and Recurrent Neural Networks," 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016, pp. 1-4, Linhui Sun, Bo Zou , Sheng Fu , Jia Chen , Fu Wang , Speech Emotion Recognition Based on DNNDecision Tree SVM Model, Speech Communication (2019). H. Dolka, A. X. V. M and S. Juliet, "Speech Emotion Recognition Using ANN on MFCC Features," 2021 3rd International Conference on Signal Processing and Communication (ICPSC), 2021, pp. 431-435, [5 B. McFee, J. W. Kim, M. Cartwright, J. Salamon, R. M. Bittner and J. P. Bello, "Open-Source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible Audio Research," in IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 128-137, Jan. 2019, Oriol Nieto and Juan Pablo Bello Conference: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Year: 2014, Page 664 Kalam, Akhtar, Swagatam Das, and Kalpana Sharma. Advances in Electronics, Communication and Computing. Springer Singapore, 2018. A. U A and K. V K, "Speech Emotion RecognitionA Deep Learning Approach," 2021 Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 2021, pp. 867-871, D. C. Shubhangi and A. K. Pratibha, "Asthma, Alzheimer's and Dementia Disease Detection based on Voice Recognition using Multi-Layer Perceptron Algorithm," 2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), 2021, pp. 1-7, Chaudhary, A. N. K. U. S. H., et al. "Speech emotion recognition." J Emerg Technol Innov Res 2.4 (2015): 1169-1171. Guihua Wen, Huihui Li, Jubing Huang, Danyang Li, and Eryang Xun, “Random Deep Belief Networks for Recognizing Emotions from Speech Signals”, Computational Intelligence and Neuroscience, Volume 2017, Article ID 1945630, 9 pages, March 2017. M. S. Hossain and G. Muhammad, “Emotion Re cog nition Us ing D e ep Le ar nin g Ap pro ac h fr o m Aud io - Visual Emotional Big Data”, Information Fusion, vol. 49, pp. 69-78, September 2019. Pawan Kumar Mishra and Arti Rawat, “Emotion Recognition through Speech Using Neural Network”, International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), Volume 5, Issue 5, pp. 422-428, May 2015. Awni Hannun,Ann Lee, Qjantong Xu and Ronan Collobert, Sequence to sequence speech recognition with time-depth deperable convolutions, interspeech 2019, Sep 2019.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 330-334 doi:10.4028/p-3gv27w © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-27 Accepted: 2022-09-16 Online: 2023-02-27
Social Media User Oppression Detection Technique Using Supervised and Unsupervised Machine Learning Algorithms Shreya Pradeep T.1,a*, P. Durgadevi2,b Student, Computer Science and Engineering, SRM Institute of Science and Technology, India
1
Assistant Professor (SR.G),Computer Science and Engineering, Faculty of Engineering and Technology, SRM Institute of Science and Technology, India
2
a*
[email protected], [email protected]
Keywords: cyberbullying, detection techniques, social media, digital Communication.
Abstract. Social media is the highly gained popular medium of interaction. And because of using the social media in our day to day life more oftenly, the increase in cyberbullying has also increased, especially to the young people who are more into the social media platform. So by cyberbullying detection technique we can make a world or environment more safe. Cyberbullying detection technique will categorize tweets or comments as bullying and non bullying words. The previous existing system uses algorithms such as Naïve Bayes which is slow and less accurate in identifying cyber bullying and also high false positives is observed. Instead here CNN and LSTM is proposed to identify cyberbullying comments and to provide good amount of accuracy Introduction Social Media, allows us to create and exchange or to express our contents, messages to the outer world. By using social media, people can enjoy or can get a lot of information, can make more communication etc. But like for all having a drawback social media also has a drawback like cyberbullying, which will effect the people very much. We had lot of bullying other than cyberbullying, But the thing what makes different is cyberbullying can effect a person anywhere or any time. It may not be a face to face problem, But the effect it can have is more. For the ones, who are facing it has to face a lot problems like depression, mental stress etc. Cyberbullying also has a major impact on people especially for teenagers same as other bullying that is happening. Due to misunderstandings regarding freedom of speech, the advent of social media, particularly Twitter, raises a number of issues. Cyberbullying is one of these issues, which is a serious global issue that affects both individual victims and societies. The literature presents a number of interventions, preventions, and mitigations of cyberbullying; however, they are not practical because they depend on the victim's interactions. The detection of cyberbullying without the involvement of the victims is therefore necessary. As a result of social media, human activity has been incorporated into education, business, entertainment, and e-government. Based on predefined keywords, the proposed approach detects cyberbullying by extracting tweets, classifying them using text analysis techniques, and then classifying them as offensive or non-offensive. Cyberbullying is a critical cybersecurity threat that continuously targets Internet users - social media users in particular. A cyberbully is a purposeful and repetitive act of aggression done using social media platforms such as Facebook, Twitter, Instagram, and others. Research on computational studies of bullying has shown that natural language processing and machine learning are powerful tools for understanding bullying. Detecting cyberbullying can be described as a problem of supervised learning. In order to recognize a bullying message, a classifier is trained using a cyberbullying corpus labeled by humans.
Advances in Science and Technology Vol. 124
331
Existing System The existing system uses Naive Bayesalgorithm to identify or to spot the cyberbullying tweets. It is mainly used to or is more concentrated to classify test data with more accuracy. And the existing system, the test data is that the tweets documents. There are two stages, training on documents with known categories and document classification. Limitations ●
By using Naive bayes, more time is taking for training.
●
Less accurate in identifying cyber bullying.
●
High False Positives observed in existing system.
●
More time consuming.
Proposed System. The proposed work is a social media user oppression detection technique using supervised machine learning algorithms CNN and LSTM. The proposed system consist of four modules. Data collection, Data preprocessing module, Algorithm implementation, Accuracy validation module. 1. Data Collection. Basic requirement to apply the machine learning algorithms and to test data mining is collecting respective dataset. For that, We need appropriately categorized data where each tweet is categorized as bullying or not. To apply and test the different algorithms, datasets are created. 2. Data Preprocessing Module. Preprocessing the data enhances the data set to have the only important information or the one that is needed. Tokenization. The process of distributing a large number of set of unstructured messages into a small is usually called as Tokenization. And it is classified in the basis of the different aspects., ie. Having punctuation mark or white spaces etc. Stop Word Removal. The process of removing the unwanted words that is not really needed in a sentence is happening in this step. The words like ‘a’ or ‘are’ in some sentences may be not needed, because they are not giving more meaning to the sentences and they are more oftenly used in sentences. So the removal of such words from a message contribute for a better form of text for the further steps. Replacement of Special Characters. As the name itself suggests, in this step, replacing the special characters is happening. For example words like ‘@’ will be replaced as its original word that is ’at’. So this step has a major importance. Stemming and Lemmatization. The process of removing a part of word to its root is usually called as stemming. Lemmatization usually concentrates on the context where the word is been used, whereas stemming concentrates on the root.
332
IoT, Cloud and Data Science
Fig. 1 3. Algorithm Implementation. Pre-processed data needs to be feed into appropriate models for analyzing. Existing methods used Naïve Bayes algorithm for tweet classification. Here, we will be categorizing tweets as bullying or not using below algorithm. Before data is trained with respective models, the data has to be split for the training and testing process with a ratio. 4. Accuracy Validation Module. Preprocessed tweets will be split into training and testing data set. The split dataset will be loaded into models CNN and LSTM. Once training is completed with training dataset, test dataset will be validated with the respective models. Accuracy will be calculated. Classification will be done based on the respective algorithms.
Fig. 2 Confusion matrix of LSTM
Fig. 3 Confusion matrix of CNN
Advances in Science and Technology Vol. 124
333
Literature References Samaneh nadali, Mesrah Azrifah Azmi murad, Nurfadhina Mohamed sharef, Aida Mustapa, Somayeh Shojae[8]When the online communications were emerging because of the expansion of web 2.0,this helped the users to exchange their information and data to other person very easily. However the internet helps to make new connections to other person it also can make children or the young people to use their facility in a bad manner, that is cyberbullying. The young people will have an tendency to hurt others. Because of this misuse of social media platforms, different techniques were created to solve the crisis. Using a dataset of real-world conversations, the proposed method annotates each predator question in terms of severity using a numeric label.[9]Using a singular value decomposition, we formulate predators questions as a sequential data modelling approach. Thabo Mahlangu, Chunling Tu,Pius Owlawi[3]Misusing the social media platform is the major reason for the rise of cyberbullying issue. By detecting the comments and the messages, it actually can make a safe world to everyone who are facing so much struggle because of cyberbullying. It is easy for the one who are bullying, Because they are trying to hurt others feelings or in other words it is a type of harassment. But it is so much painful for the ones who is facing. The paper proposed here is discussing about the different types of cyberbullying, and various machine learning technologies for finding the cyberbullying detection has been used.[4]This paper presents a taxonomy of multiple methods for detecting cyberbullying. A comprehensive analysis and classification of the work done in recent years is also presented. Mohamed Ali Al-garadi, Mohamad Rashid hussain,Nawsher Khan, Ghulam Murtaza, Henry Friday Nweke, Ihsan Ali, Ghulam MuJtaba, Haruna chiroma, Hasan Ali khatak, Abdullah Gani[1]There is aggression and violence still existing in different manner. But because of the Social media platform it makes a one more way to show the violence and aggression to the outer world by online platform. So this way of expressing the violence and aggression in Sm websites is showing in the paper. So the paper by the above authors is providing a process that is for cyberbullying detection and also the methodology.[2]The suggested paper proposes a smart model named time Md that incorporates implicit feedback and temporal information to address the above issues in social ecommerce recommendation. Summary The increase in the use of social media platforms hin our day to day life has made a lot of changes to the world. And also the increase of cyberbullying has also increased a lot and it is a major problem that so many of us are facing. Different types of researches has tried their best to find the cause of the cyberbullying. The machine learning technique has helped us a lot more to prevent the bullying messages through online platform itself to make a more safe world. Cyberbullying is not a small issue, it is an serious issue that is happening around us. And the social media platform is the most suitable platform to make this more worse. The proposed system has acquired more accuracy by detecting the bullying comments or messages and the outcomes are more promising with great accuracy. The proposed system can be used for cyberbullying detection in other applications.
334
IoT, Cloud and Data Science
References [1] Mohammed Ali Al-Garadi, Mohammad Rashid Hussain, Nawsher Khan, Ghulam Murtaza, Henry Friday Nweke, Ihsan Ali, Ghulam Mujtaba, Haruna Chiroma, Hasan Ali Khattak, Abdullah Gani, "Predicting Cyberbullying on Social Media in the Big Data Era Using Machine Learning Algorithms: Review of Literature and Open [2] Mingyang Li, Hongchen Wu, Huaxiang Zhang, "Matrix Factorization for Personalized Recommendation With Implicit Feedback and Temporal Information in Social Ecommerce Networks", Access IEEE, vol. 7, pp. [3] Thabo Mahlangu, Chunling Tu, "Deep Learning Cyberbullying Detection Using Stacked Embbedings Approach", Soft Computing & Machine Intelligence (ISCMI) 2019 6th International Conference on, pp. [4] Madhura Vyawahare, Madhumita Chatterjee, Data Communication and Networks, vol. 1049, pp. 21, 2020. [5] Semiu Salawu, Yulan He, Joanna Lumsden, "Approaches to Automated Detection of Cyberbullying: A Survey",Affective Computing IEEE Transactions on,vol. 11, no. 1, pp. 3-24, 2020. [6] Tolba Marwa, Ouadfel Salima, Meshoul Souham, "Deep learning for online harassment detection in tweets", Pattern Analysis and Intelligent Systems (PAIS) 2018 3rd International Conference on, pp. 1-5,2018.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 335-343 doi:10.4028/p-r4i40i © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-12 Accepted: 2022-09-16 Online: 2023-02-27
Medical Diagnosis through Chatbots Purohit Iyera, Arnab Sarkarb, Karthik Prakashc, P Mohamed Fathimald SRM Institute of Science and Technology, Vadapalani Campus , India [email protected], [email protected], [email protected], d [email protected]
a
Keywords— Chatbot, Natural Language Processing, Database, Disease Prediction
Abstract. Medical attention is critical to living a healthy life. If you have a health concern, however, it is quite difficult to seek medical help. The idea is to create a medical chatbot that can assess symptoms and provide a list of illnesses the user might have using AI and other biometric parameters. In medical diagnosis, artificial intelligence aids in medical decision making, management, automation, administration, and workflows. It can be used to diagnose cancer, triage critical findings in medical imaging, flag acute abnormalities, assist radiologists in prioritizing life-threatening cases, diagnose cardiac arrhythmias, predict stroke outcomes, and aid in chronic disease management. Medical chatbots were created with the goal of lowering medical costs and increasing access to medical information. Some chatbots act as medical guides, assisting patients in becoming more conscious of their ailment and improving their overall health. Users will undoubtedly profit from chatbots if they can identify a variety of illnesses and provide the necessary information that may help the user to understand the predicament, he/she might be facing. The main idea is to create a preliminary diagnosis chatbot that allows patients to participate in medical research and provide a customized analysis report based on their symptoms 1. Introduction [1][2]Artificial intelligence has currently progressed to the point that algorithms can learn and accurately replicate human conversations. With the rapid advancement of technology, internetenabled computers may now be found in every institution, enterprise, home, and, eventually, the user’s pocket. Chatbots have grown in popularity as effective tools for businesses and institutions in this setting. Siri – the AI assistant that is part of Apple’s standard software for its products – is one of the most well-known instances of chatbots in recent history. A medical chatbot facilitates the job of a healthcare provider and helps improve their performance by interacting with users in a human-like way. There are countless cases where intelligent medical chatbots could help physicians, nurses, therapists, patients, or their families. They can step in and minimize the amount of time they spend on tasks like: providing health- related information to users, guidance for patients, medication management and dosage, connecting people and organizations with first responders and FAQ-type queries (contact details, directions, opening hours and service/treatment details)[3]. An advanced Natural Language Understanding (NLU) based standard text-to-text chatbot called a "Diabot" engages patients in conversation while providing individualised predictions based on the patient's reported symptoms and the general health dataset. The concept is further developed as a "DIAbetes chatBOT" for specialised diabetes prediction utilising the Pima Indian diabetes dataset and proactive preventative action suggestions.[4] [3,11] It’s important to note that despite the fact that chatbots can offer valuable facts and symptoms, they aren’t qualified to give an official diagnosis. The main premise behind these talking or texting smart algorithms is to become the first point of contact before any human involvement is needed. The ad- vent of chatbots as self-help agents has resulted in companies adopting the platform to create automated customer service agents to handle simple navigation queries and FAQs related to company operation. However, the usage of chatbots in the medical field is relatively low since hospitals are not confident in chatbots replacing humans in terms of accurate diagnosis. Some companies have started using chatbots in the medical industry for specific reasons such as education/awareness of re- cent medical issues, analysis of patient mentality with regards to hospital
336
IoT, Cloud and Data Science
treatment, doctor-patient conversations regarding consultation, etc. Examples of these types of chatbots include Online Medical Chatbot System based on Knowledge Graph and Hierarchical BiDirectional Attention [5], Sensely, a medical diagnosis chatbot that patients can report their symptoms to and can receive either a refe Online Medical Chatbot System based on Knowledge Graph and Hierarchical Bi-Directional Attentionrral or self-help advice. SHIHbot, a Facebook chatbot created to increase awareness on sexual health and STD’s such as HIV/AIDS through online surveys and questionnaires. The main objective of creating a medical diagnosis Chatbot is to ease the interaction between a patient and their doctor by having a user-friendly application that also aids in increasing awareness about general medical illnesses. The main goal of creating “Violet”, our Chatbot, is to have a user-friendly web application that allows the user to specify the symptoms he/she is feeling (upto 5 symptoms) and make an accurate prediction of the disease he/she might have. One of the unique features of Violet is that the application also recommends users to different doctors depending on the disease he/she is predicted to have by the Chatbot. Note that the purpose of this application is to make the process of pre-consultation easier i.e., it helps the user to get an idea about what he/she is feeling which helps them to give an accurate description to the doctor during consultation and initial diagnosis. 2. Literature Study [6][7][8]In case of NLP algorithms for disease diagnosis, the efficiency and precision of the classification model is to be of prime importance. An intense and time-consuming research was conducted by multiple organizations to come up with the most viable classification algorithm for disease diagnosis through text. There are several steps to text classification: Input gathering and data pre-processing, medical terminology detection, mapping relevant documents, and generating answers and solutions. Based on this, Andrew Reyner Wibowo Tjiptomongsoguno, Audrey Chen,Hubert Michael Sanyoto, Edy Irwansyah , and Bayu Kanigoro provided a comparison between various text classification algorithms. Multinomial Na ıve Bayes used for text classification.[6] [10] Since then brands in every industry have started to use them, eventually sparking a new trend – conversational user experience. This refers to a User Experience in which the user’s interaction with a company or service is automated based on his / her prior behavior, which is explicitly seen in the field of medicine. 3. Methodology The proposed System has two phases. The first phases use three classifier algorithms. This classifier treats every word independently which is then organized into two dictionaries: corpus words and class words. Each word is tokenized, stemmed, converted to lowercase and transformed into training data. Each class generates a total score for the number of words that match. • Random Forest • Naive Bayes • Support Vector Machine (SVM) 3.1 Phase 1: Comparison of Algorithms 3.1.1. Naive Bayes Naive Bayes Classifiers are one of the most basic and successful classification systems. This technique is based on the concept of Bayesian Networks, which are probabilistic graphical approaches to understanding a list of random variables and their conditional relationships. There are various efficient methods in Bayesian Networks that do inference and learning. The sole prerequisite for it to be relevant is that the data set’s characteristics be independent. The properties in the dataset are reliant on each other due to species evolution, although the dependence does not appear to be substantial [7]. As a result, the classification is based on the Naive Bayes approach, which considers that characteristics of the given dataset are independent. The Naive Bayes classifier uses the following formula to achieve the above objective:
Advances in Science and Technology Vol. 124
337
Fig. 1. Naive Bayes Formula
Fig. 2. Result of classifier trained using dataset A single tuple corresponding to each disease was used to test the model and observe the results. The above results show that the model can classify accurately the list of diseases that the model is trained with. 3.1.2. Support Vector Machine [8][9] The training dataset consists of all the symptom attributes which have values “0” or “1” indicating whether the patient is exhibiting that particular symptom or not, and a nominal attribute corresponding to the disease that the attribute combinations are indicating. “0” would mean that the patient is not exhibiting that particular symptom, and “1” otherwise. The test dataset consists of 41 tuples of different symptom combinations. To test the robust- ness of the model, a 10-fold cross validation was performed in the training dataset [8]. 3.1.3. Random Forest The objective of this model is to predict the illness the user is expected to have by using a concise list of symptoms [5] that is passed into the random forest algorithm. The dataset contains samples and attributes are taken as features for the dataset. The (random forest) algorithm establishes the outcome based on the predictions of the decision trees. It predicts by taking the average or mean of the output from various trees. Increasing the number of trees increases the precision of the outcome. Once the list of symptoms is fed into the algorithm, two processes occur; the creation of multiple decision trees and the classification of decision trees based on results. Bagging is used to obtain the results from the decision trees and an ensemble method is used for classification of the decision trees in a majority
338
IoT, Cloud and Data Science
vote format ie; if same result is obtained on different leaf nodes of multiple decision trees, then that result is treated as the output for the random forest. SVM training algorithm builds a model that predicts whether the test image falls into this class or another. SVM requires vast training data to decide a decision boundary and computing cost is very high. For the data which cannot be distinguished, the input is mapped to a high-dimensional attribute space where it can be separated by a hyperplane. Naive Bayes algorithm had the quickest execution time when compared to Random Forest and Support Vector Machine. In order to decide on which algorithm is to be used for the classification model, a detailed, cohesive, and comprehensive analysis was made by comparing the results obtained from the following classification algorithms. The Phase 2 implement a chatbot with the help of algorithm having maximum accuracy.The performance evaluation was performed based on the following attributes:
Fig. 3. Performance Metrics
Fig. 4. Results from Classification Model 3.1.4. Comparative Analysis: [12][13] The obtained precision, recall and accuracy values were analyzed and the following conclusions were made:
Advances in Science and Technology Vol. 124
339
Table 1. Performance Analysis of Algorithms
After training and testing multiple algorithms using the parent dataset, it was discovered that Naive Bayes had the maximum accuracy, efficiency, and optimal execution time. The conclusion is further supported by the below given results
Fig. 6. SVM Confusion Matrix and Accuracy Result 3.2. Phase 2 The application framework consists of two databases (user details and symptoms), a prediction algorithm used to identify diseases from the symptoms specified and an API which acts as a user interface for the user to access medical history details and interact with the chatbot. 3.2.1. Database (MongoDB): The database stores information regarding the users who register with the application and initiate a conversation with the chatbot, the doctors whose information is provided to the user with regards to the disease he/she is predicted to have with regards to the list of symptoms they provide. The database also acts as a repository of the symptom disease binary database which is fed to the prediction algorithm in order to predict the disease. The database is fabricated using MongoDB and is divided into two components: the user details database and the disease – symptom database. The training dataset used for the Naive Bayes classifier model consisted of 41 tuples of various combinations of disease symptoms. The dataset contained 113 distinct and independent attributes and one class attribute, which the model uses to classify the set of symptoms. The dataset contains diseases and its related symptoms as rows and columns respectively. For each disease its symptoms are displayed in a binary format. If the symptom is a reliable cause for the
340
IoT, Cloud and Data Science
disease, its value is represented as “1”, else its value is “0”. Both the training and the testing datasets adhere to the same format given.
Fig. 7. Random Forest Confusion Matrix and Accuracy Result
Fig. 8. Decision tree Confusion Matrix and Accuracy Result
Fig. 9. Naive Bayes Confusion Matrix and Accuracy Result
Advances in Science and Technology Vol. 124
341
Fig. 10. Column Distribution for each disease based on symptoms from training dataset 3.2.1. Prediction Algorithm The medical diagnosis application uses various prediction algorithms such as Support Vector Machine, Random Forest and Naive Bayes and selects the best one which is determined by the accuracy and efficiency of the algorithm. The disease– symptom database is fed into the algorithms and based on the user’s symptom list a disease is predicted by performing the prediction algorithm on the database. The result is a disease that the user is presumed to have with regards to the symptom list he/she sent to the chatbot as a query. 3.2.2. Backend API The API is built using NodeJS, ExpressJS for server configuration and MongoDB for persistent storage. The API consists of the following modules: 1) Registration: The registration module enables users to enter their information into the system and for the classification model to use for its text analysis. The user fills the details on the registration portal. The entered data is added to the MongoDB ‘users’ collection on successful registration. 2) Login: The login module helps users gain private access to use the classification model for his/her diagnosis. The login process uses a 64-bit hash to verify the integrity of the user 3.2.3. GUI The GUI is built in ReactJS and uses a react-router to switch between different pages. Working [11]: • The user is greeted by a landing page where he/she has to choose to either register for the application using their personal details if they are new to the web application, or login to use the features of the application using their username and password. • Once logged on, the user has to choose from a drop-down menu to either start a conversation with “Violet” the self-diagnostic chatbot, visit the FAQ page or visit the user’s dashboard which shows their personal info, history of visits, etc. • The chatbot window opens once the user chooses to initiate the conversation with Violet. The chatbot greets the user with greeting phrases such as “Hello”, “Hi” or “Good Morning” after which the user responds with his/her requirements. • If the user wants to find out the illness, he/she might have, the query given to the chatbot is a set of symptoms (5) which is then compared with the tuples in the database and an accurate prediction of the disease is achieved. • Other operations that the chatbot can execute with the discretion of the user is to visit the dashboard, get doctor references and quit the application.
342
IoT, Cloud and Data Science
Fig. 11. Flow Diagram 4. Conclusion In this paper a study was presented on the different algorithms for text classification in the context of disease diagnosis. A CSV dataset was given for training the classification model and different classification algorithms were used to train the model. In the end, a comparative study was made and the Naive Bayes classifier was chosen to use for the model. The final testing dataset was gathered from the input that the user provides in the chatbox. The input is then preprocessed, formatted, and then provided to the model for classification. The classification result is then returned to the user in the chat box and subsequently, stored into the user’s records. References [1] Agrawal, M., Cheng, J., Tran, C.: What’s up, doc? a medical diagnosis bot [2] Amato, F., Marrone, S., Moscato, V., Piantadosi, G., Picariello, A., San- sone, C.:Chatbots meet ehealth: Automatizing healthcare. In: Workshop on Artificial Intel-ligence with Application in Health, Bari, Ital, pp. 40–49 (2017) [3] Aswini, D.: Clinical medical knowledge extraction using crowdsourcing techniques.Int. Res. J. Eng. Technol. 6 (2019) [4] Bali, M., Mohanty, S., Chatterjee, S., Sarma, M., Puravankara, R.: Diabot: A predictive medical chatbot using ensemble learning [5] Bao, Q., Ni, L., Liu, J.: Hhh: an online medical chatbot system based on knowledge graph and hierarchical bi-directional attention. In: Pro- ceedings of the Australasian Computer Science Week Multiconference, pp. 1–10 (2020)
Advances in Science and Technology Vol. 124
343
[6] Andrew Reyner, Wibowo Tjipto Wongsoguno, Audrey Chen, Hubert Michael Sanyoto, Edy Irwansyah(B), and Bayu Kanigoro: Medical chatbot techniques, a Review, pp. 1–11, 2020. [7] S Vijayarani, S Deepa: Na¨ıve Bayes Classification for Predicting Diseases in Haemoglobin Protein Sequences, International Journal of Computational Intelligence and Informatics, Vol. 3: No. 4, January - March 2014. [8] Wei Yu, Tieben Liu, Rodolfo Valdez, Marta Gwinn, Muin J Khoury: Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes, pp 22 March 2010. [9] Jennifer Hill, W. Randolph Ford, Ingrid G. Farreras: Real conversations with artificial intelligence: A comparison between human–human online conversations and human–chatbot conversations [10] Walss M., Anzengruber F., Arafa A., Djamei V., Navarini A.A.: Implementing Medical Chatbots: An Application on Hidradenitis Suppurativa [11] Zeineb Safi, Alaa Abd-Alrazaq, Mohamed Khalifa, Mowafa Househ: Technical Aspects of Developing Chatbots for Medical Applications: [12] Uddin, S., Khan, A., Hossain, M. et al. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak 19, 281 (2019). [13] Dhiraj Dahiwade; Gajanan Patle; Ektaa Meshram; Designing Disease Prediction Model Using Machine Learning Approach, pp 27 March 2019, IEEE Conf in Erode, India
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 344-354 doi:10.4028/p-8fy5ca © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-30 Accepted: 2022-09-16 Online: 2023-02-27
Sentiment Analysis of National Eligibility-Cum Entrance Test on Twitter Data Using Machine Learning Techniques E.Chandralekha1,a*, Jemin V.M.2,b, P.Rama3,c and Prabakaran K4,d Assistant Professor, Department of CSE, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India.
1,4
Assistant Professor, Department of CSE, R.M.K College of Engineering, Chennai, India.
2
Assistant Professor, Department of CSE, Bharath Institute of Higher Education & Research Chennai, India
3
[email protected], [email protected], [email protected], d [email protected]
a
Keywords: NEET Exam, Sentiment Analysis, Social Media, Twitter, Tweepy, Machine Learning (ML).
Abstract. People around the world use social media to communicate and share their perceptions about a variety of topics. Social media analysis is crucial to interacting, distributing, and stating people's opinions on various topics. Governments and organizations can take action on alarming issues more quickly with the help of such textual data investigation. The key purpose of this effort is to perform sentiment analysis of textual data regarding National Eligibility-cum Entrance Test (NEET), perform classification and determine how people feel about NEET. In this study, 11 different machine learning classifiers were used to analyze tweet sentiment, along with natural language processing (NLP). Tweepy is the python library which is used to get user opinion about NEET Exam. Annotating the data is accomplished using TextBlob and Vader. Text data is preprocessed with a natural language toolkit. The dataset downloaded from Twitter shows that unigram models perform well compared to bigram and trigram models. TF-IDF models are more accurate than count vectorizer which is based on word frequency. SVM classifier achieves an average accuracy of 92%. Perceptron also receives the uppermost average accuracy of 91%. According to the data from the experiment, most people have a neutral opinion of NEET. Introduction National Entrance and Eligibility Test (NEET) is the national entrance examination for medical courses such as BDS and MBBS as well as PG in private and government medical colleges. NEET UG Exam is held every year by the National Testing Agency (NTA) in several locations in India. Almost 11-13 lakh candidates apply each year to take the offline MCQ (Multiple Choice Question) entry test. To get admission for MBBS,M.D courses in medicine, the National Medical Commission will conduct a public and constant National Eligibility -cum- Entrance Test (NEET) in all medical institutions whether they are government-sponsored or not. Consequently, NEET will be accepted by all AIIMS medical colleges, JIPMER, and other institutions. NEET (UG) was introduced nationwide after state-based entrance tests were conducted previously and some prestigious medical colleges conducted their own admission tests, including AIIMS, JIPMER, IMS-BHU, KMC Manipal & Mangalore and CMC Vellore. Historically, students who passed the Tamil Nadu Board of Secondary Examination (TNBSE) had a larger share of seats in medical colleges than students who passed the NEET alone. CBSE students benefited from it. A major objective of NEET was initially to ensure that only meritorious candidates were granted admission into medical colleges along with ending capitation fees, a practice that stoked corruption. The model assumes that all contestants compete from the same position and are constrained by the same factors. In the Rajan report, the flawed approach to tests and medical admissions in India is compared with practices in other countries, such as the US and UK. In other words, instead of merely looking at test scores, colleges there assess a student's ability to consistently perform, and
Advances in Science and Technology Vol. 124
345
take into account test scores in conjunction with school and college grades, a student's personal statement, including how well the student demonstrates commitment to a potential career, and other factors. Additionally, the tests are designed so that they assess subject-specific knowledge, writing skills, and analytical abilities. Specifically, the goal of this work is 1) make use of NLP and ML techniques for analyzing sentimentality of tweets related to Exemption of NEET in Tamilnadu; and 2) use various Vectorization methods to get better accuracy with unigram, bigram and trigram. Below are the main contributions of this work. 1) Analysis of sentiment is performed on text data so that we can understand user’s opinions about NEET exemption in Tamilnadu. 2) It also identifies the most frequently mentioned aspects of the data to achieve better understanding and creation of awareness programs. 3) Tweepy API is used for collecting data from Twitter and pre-processing the data. 4) In the feature extraction section, we use the unigram, bigram, and trigram methods, as well as a study of their performance results. We used 11 different classifiers intended for training and testing, and we compare, illustrate performance metrics like precision, recall, F1, and accuracy. Following is an outline of the rest of the article. The literature survey is presented in Section II covers sentiment analysis. In Section III, we discuss the experimental methodology we used and the experimental setup we employed. Section IV explains the results and analysis. The most frequently encountered challenges are presented in Section V, as well as the possible limitations of this work. In Section VI, the conclusion is presented with suggestions for future work. Literature Survey Aarzoo Dhiman et al. developed a framework that relies on Twitter comments to assess the success of the campaign called Swachh Bharat Abhiyan (SBA) cleanliness in India. The Twitter data was analysed to predict citizen perceptions using word embeddings and Subjectivity total with Sentiment Analysis. Finally, the recital of the city was predicted using predictive models generated from demographic features. Experiments confirm a correlation of 0.77 as well as using Random Forest Regression; cities can be predicted with a degree of accuracy between 80 and 85% [2]. Nadia Felix F et al. presented an overview of semi- supervised methods for classification of tweets. Methods consisting of topic, graph, and wrapper based approaches can be used. Comparing algorithms such as self-training, topic modeling, co-training and distant supervision highlights their specific biases and practical considerations [3]. Ashima Yadav et al. proposed a network which uses mechanisms that combine word-level, sentence level attention methods for determining the positive, neutral and negative statements. Using local text characteristics and previous and upcoming context information, the network classifies sentiments in documents based on subtle cues made in the document. Tweets were collected from Twitter and topic modeling was applied to extract the document themes and establish a COVID-19 Sentiment Dataset. 85% accuracy has been achieved by the proposed model, which is superior to other well-known algorithms [4]. Devendra K. Tayal Analyzed the tweets by location and predicted the point of polarity based weekly and monthly analysis. Five phases of experiment were undertaken – acquisition, pre- processing of tweets, tokenizing, assessing sentiment of a sentence, assessing sentiment of a document and analysis. Additionally, our tool can handle transliterated words. We used a machine learning approach to extract unbiased tweets related to this particular campaign from Twitter, we achieved 84.47 % accuracy. As a result, social campaigns are implemented effectively by the government for the good of society [5]. Rocío B. Hubert et al. experimented, Twitter Inquiry for Administration Intelligence and Community Contribution which visualizes and analyses data gathered from Social Network (Twitter) and give suggestions to rule- makers and nontechnical people to help understanding. The usage of the tool is shown by five government secretaries and the corresponding voter answers for nine month period from the well-being, education, public development, labour, and atmosphere divisions in Mexico [7]. Anastasia Giachanou et al. discussed about sentimentality analysis of
346
IoT, Cloud and Data Science
Twitter information, counting Twitter view extraction, sentiment following, irony discovery, feeling recognition, and sentiment quantification become more popular. They presented approaches to sentiment analysis on Twitter, classify them based on the techniques they use, and discuss recent research trends in related areas in this survey [8]. Paltoglou.G et al. proposed lexicon based, intuitive, unsupervised method to estimating the level of emotional strength contained within text. Subjectivity detection and polarity classification are complementary but different contexts in which this approach can be applied. This study included a number of experiments with real-life datasets obtained from social Networks and labeled by human experts, as well as comparisons with advanced supervised methods. In spite of its unsupervised nature, the algorithm outperforms machine learning in most cases, suggesting a very strong and consistent result to informal communication sentiment analysis [9]. Jose N. Franco-Riquelme et al. analyzed Twitter comments regarding the general elections in Spain in 2015 and 2016, including 250 000 tweets from those elections. Text processing and NLP techniques facilitate data analysis, and a quantitative analysis of a group of Spanish language texts is used to analyze significant data retrieved from Twitter. We performed an detailed examination of Twitter customers support in the election by extracting information from three Spanish regions based on their location, and selecting features grounded on the keywords of the main four political parties [10]. Itisha Gupta et al. proposed a properties-based system coupled with improved negation accounting, that exploits dictionary-based, morphological, Part-of-speech based, n- gram features, as well as other features, for classifier training and to gain a holistic understanding of polarity determination. They used three diverse classifiers. Experiments were carried out to find which classifier is best for which feature category. In addition, they look into a linguistic phenomenon known as denial, which can modify the strength of opinionated words. They created an algorithm to deal with negation tweets where the appearance of the negation does not always imply negation [11]. Bibi Amina et al. presented an automated method for creating a lexicon of optimistic, bad, and neutral terms on a given topic. They offered new study and way for the government, policymaking organizations, and additional stakeholders, as well as showcasing text mining as a critical technique for extracting corporate value from massive sizes of community media data. In this study, text mining is shown to be a strong tool for extracting commercial value from big capacities of social media data, with a goal of identifying upcoming research and track for government, policymaking organizations, and added stakeholders [12]. Proposed Work In this section, we have developed a framework for sentiment analysis of NEET Exam in Tamilnadu. The approach was clearly explained in Fig.1.The steps that are followed in opinion mining are as follows. Data Collection. Data are collected from Twitter by using a famous Python library called Tweepy. To use Tweepy, user has to create a developer account in Twitter. Opinions about NEET exam are collected from Tweepy. “NEET Exemption” and “Tamilnadu” are the two terms given as the query statement. Totally, 56000 tweets were collected about NEET Exemption in Tamilnadu. These tweets are collected from different users in different time and location. These tweets contain symbols like hash tags, mail ids and some text. These data has to be removed for better accuracy. Data Pre-Processing. Text pre-processing is a technique for cleaning text data and preparing it for feeding into a model. The data contained in text contains noise in various forms such as emotions, hyperlinks, punctuation, and text in different cases. For example, a tweet stating, "Thank you sooooooo much," should have been rephrased to "Thank you so much," to avoid such structural deflection. Fig.1 demonstrates the design of proposed System. Additionally, nouns and stop words would be used in the tweet. These stop words have no meaning, such as ''is, was, then, that, who'' and so on. The tweet features must be adjusted to eliminate stop words and noise. In the same way,
Advances in Science and Technology Vol. 124
347
the process of lemmatization is required to identify the root words and nouns contained in tweets. After the required words have been identified, the occurrence of the hash tag is removed.
Fig.1 Proposed Framework Algorithm for Data Pre-processing Input: Dictionary Dict, Stop words SW, Twitter Dataset DS Output: Pre-processed Words PW 1. Read DS 2. For each tweet 3. PW=Split(DSi, Space) where i=1 to size(DS) (1) 4. PW=if PWi ∈ SW ∩ PW, where i=1 to size(PW) (2) 5. PW=Lemmatization(PWi,DS), where i=1 to size(PW) (3) 6. PWF=if(PWiisHashTag)?PW ∩ PWi, where i=1 to size(PW) (4) 7. End Loop 8. Stop Using the above algorithm the text data from twitter can be pre-processed. From Eq. 1 the text data is split into words. From Eq. 2 Stop words will be removed. From Eq. 3 the words are transformed into root words using Lemmatization Method. From Eq. 4 the hash tags are removed and the final words are stored in the variable PWF. Wordnet Lemmatizer function of NLTK library is used to lemmatize the words. Data Annotation. After pre-processing of the tweets, the data annotated as neutral, positive, negative using TextBlob and VADER. TextBlob and VADER are the python libraries which can be used to label the tweets based on subjectivity and polarity of the tweets. The intersection of the outputs of both TextBlob and VADER were taken. These two libraries use Lexicon- based approach to annotate the tweets. Finally we had 56000 tweets as total. From this we had 33600 neutral tweets, 14850 tweets positive tweets, 7550 negative tweets. Training and Testing Using classifiers. Machine Learning algorithms need only numerical input values to process. So, our text data has to be converted into numerical data using Vectorization methods. For this, we have used Count Vectorizer (CV) and Term Frequency Inverse Document Frequency (TF-IDF) method. These two methods take 1-gram, 2-gram and 3-gram of the preprocessed data as the input. These two methods calculate the frequency of the words and form the sparse matrix. Following this, the data is separated into two groups: training and testing. 80% data is used for training purpose and testing dataset takes 20% of the data. Now the data is given as the input for the classifier. The algorithms we have used in this work are SVM, KNN, Decision Tree (DT), Random Forest (RF), and Multinomial Naïve base (MNB), Gaussian Naïve Bayes (GNB), Bernoulli Naïve Bayes (BNB), passive aggressive (PA) classifier, perceptron, ridge classifier and AdaBoost classifier. The performances of the classifiers are evaluated based on 1-gram, 2-gram and 3- gram.
348
IoT, Cloud and Data Science
Experimental Setup Our experiments were run in Python 3.10 using a core i5 processor, with 8 GB of RAM. We used Python's Anaconda IDE to code the experiment. We retrieved tweets using Twitter's "Tweepy" API, then processed them using Python's NLTK, RE. In addition to matplotlib, pandas, numpy, and sklearn, this work uses. Result and Analysis The results of the 11 different classifiers were discussed here. The output of these classifiers have been analysed by using recall, precision, F1 score and accuracy. Finally K- Fold validation method is used to test the output of classifier with k=10 value. In this work, the data are classified as positive, negative and neutral. Here, we used 3 classes. So, it comes under Multi-class classification. Fig.2 represents the confusion matrix of multi class classification. Predicted Class Positive Positive Actual Class
True Positive(TP)
Negative
Neutral
False Positive1(FP1)
False Positive2(FP2)
False True Negative(TN) False Negative2(FN2) Negative1(FN1) False False True Neutral (TNE) Neutral1(FNE1) Neutral2(FNE2)
Negative Neutral
Fig. 2 Confusion Matrix for 3-class In this work we have used three classes. So, it comes under multi class classification. Fig. 2 depicts the confusion matrix for multi class classification. To analyze the performance of the classifier we have used Recall, Precision and F1 score. For each class we have calculated Recall, Precision and F1 score based on the equations Eq. 1-7. Precision for class Positive =
TP (TP + FP1 + FP2) TN (FN1 + TN + FN2)
(2)
TNE (FNE1 + FNE2 + TNE)
(3)
Precision for class Negative = Precision for class Neutral =
(1)
Recall for class Positive =
TP (TP + FN1 + FNE2)
(4)
Recall for class Negative =
TN (FP1 + TN + FNE2)
(5)
Precision for class Neutral = F1 Score =
TNE (FP2 + FN2 + TNE)
precison * 2 * recall recall + precision
(6) (7)
Advances in Science and Technology Vol. 124
349
Table I. Recall, Precision and F1 score values of positive class using 1-gram with count-vectorizer Classifier
SVM KNN Decision Tree Random Forest Multinomial Naïve Base Gaussian Naïve Bayes Bernoulli Naïve Bayes Passive Aggressive Perceptron Ridge Classifier AdaBoost Classifier
1-Gram using Count –Vectorizer Positive Recall Precision F1Score 0.90 0.91 0.90 0.59 0.62 0.61 0.86 0.87 0.86 0.85 0.86 0.85 0.77 0.79 0.78 0.79 0.81 0.80 0.83 0.85 0.84 0.78 0.80 0.79 0.90 0.91 0.90 0.89 0.90 0.89 0.76 0.78 0.77
Here Table I, II, III demonstrates the recall, precision and F1 score of positive, negative and neutral classes for all 11 classifiers based on one gram using Count Vectorizer. Here Table IV, V, VI demonstrates the recall, precision and F1 score of positive, negative and neutral classes for all 11 classifiers based on one gram using TF-IDF. Similar to this we calculated recall, precision and F1 score for 2-gram, 3- gram using both Count-Vectorizer and TF-IDF. From these results, Support Vector Machine using Count Vectorizer and TF-IDF based on 1-gram achieved high Recall, Precision and F1 score when compared to other classifiers. A comparison of overall accuracy of each of the 11 classifiers is shown in Fig.3. With 1-gram for both Count Vectorizer and TF-IDF, SVM attained maximum accuracy of 92%, and Ridge classifier achieved 92% accuracy. Table II. Recall, Precision and F1 score values of Negaitive class using 1-gram with countvectorizer Classifier 1-Gram using Count – Vectorizer Negative Recall Precision F1 Score SVM 0.74 0.91 0.82 KNN 0.32 0.62 0.42 Decision Tree 0.66 0.87 0.75 Random Forest 0.64 0.86 0.73 Multinomial Naïve 0.52 0.79 0.63 Gaussian Naïve Bayes 0.55 0.81 0.65 Bernoulli Naïve Bayes 0.62 0.85 0.72 Passive Aggressive 0.53 0.80 0.64 Perceptron 0.74 0.91 0.82 Ridge Classifier 0.72 0.90 0.80 AdaBoost Classifier 0.50 0.78 0.61
350
IoT, Cloud and Data Science
Table III. Recall, Precision and F1 score values of Neutral class using 1-gram with Count-Vectorizer Classifier
SVM KNN Decision Tree Random Forest Multinomial Naïve Gaussian Naïve Bayes Bernoulli Naïve Bayes Passive Aggressive Perceptron Ridge Classifier AdaBoost Classifier
1-Gram using Count –Vectorizer Neutral Recall Precision F1 Score 0.96 0.81 0.95 0.94 0.91 0.92 0.94 0.91 0.96 0.96 0.90
0.91 0.62 0.87 0.86 0.79 0.81 0.85 0.80 0.91 0.90 0.78
0.94 0.70 0.91 0.90 0.85 0.86 0.89 0.85 0.94 0.93 0.84
Table IV. Recall, Precision and F1 score values of Positive class using 1-gram with TF-IDF Classifier
SVM KNN Decision Tree Random Forest Multinomial Naïve Gaussian Naïve Bayes Bernoulli Naïve Bayes Passive Aggressive Perceptron Ridge Classifier AdaBoost Classifier
Recall 0.91 0.61 0.79 0.85 0.79 0.80 0.83 0.90 0.88 0.89 0.80
1-Gram using TF-IDF Positive Precision F1Score 0.92 0.92 0.64 0.63 0.81 0.80 0.86 0.85 0.81 0.80 0.82 0.81 0.85 0.84 0.91 0.90 0.89 0.88 0.90 0.89 0.82 0.81
Fig. 3 shows a visual representation of overall classifier accuracy. The highest accuracy can be achieved using 1- gram classifiers based on Count Vectorizer and TF-IDF. In all 11 classifiers, using 1-grams with count vectorizer and TF-IDF, the SVM Classifier gave the best results in terms of accuracies, recall, precision and F1-Score. With SVM with 1-gram, we analysed 56000 tweets and discovered that 14850 people are positive, 33600 are neutral, and 7550 are negatively feeling about the NEET Exam because of various issues.
Advances in Science and Technology Vol. 124
351
Table V. Recall, Precision and F1 score values of Negative class using 1-gram with TF-IDF Classifier
SVM KNN Decision Tree Random Forest Multinomial Naïve Gaussian Naïve Bayes Bernoulli Naïve Bayes Passive Aggressive Perceptron Ridge Classifier AdaBoost Classifier
1-Gram using TF-IDF Negative Recall Precision F1Score 0.77 0.34 0.55 0.64 0.55 0.56 0.62 0.74 0.70 0.72 0.56
0.92 0.64 0.81 0.86 0.81 0.82 0.85 0.91 0.89 0.90 0.82
0.84 0.44 0.65 0.73 0.65 0.67 0.72 0.82 0.78 0.80 0.67
Table VI. Recall, Precision and F1 score values of neutral class using 1-gram with TF-IDF Classifier
SVM KNN Decision Tree Random Forest Multinomial Naïve Gaussian Naïve Bayes Bernoulli Naïve Bayes Passive Aggressive Perceptron Ridge Classifier AdaBoost Classifier
1-Gram using TF-IDF Neutral Recall Precision F1 Score 0.97 0.82 0.92 0.94 0.92 0.92 0.94 0.96 0.96 0.96 0.92
0.92 0.64 0.81 0.86 0.81 0.82 0.85 0.91 0.89 0.90 0.82
0.94 0.72 0.86 0.90 0.86 0.87 0.89 0.94 0.92 0.93 0.87
Limitations and Challenges: The performance of the classifier solely depends upon the preprocessed dataset. There were people who only posted their opinions on Twitter. In addition to Twitter, many people also use Facebook, YouTube, and other Social Media to express their opinions. Not everyone uses the internet and social media. As a result, this result is based only on dataset. The other limitation is that only text data was considered. A lot of people express their opinions through symbols such as emojis. These emoji aren't taken into account here. Tweets expressed in multiple languages, code mixed languages are not considered in this work. Some people used short forms like instead of “YOU” used “U”. These short forms are also not considered here.
352
IoT, Cloud and Data Science
Table VII. Classifier SVM KNN Decision Tree Random Forest Multinomial Naïve base Gaussian Naïve Bayes Bernoulli Naïve Bayes passive aggressive classifier Perceptron Ridge classifier AdaBoost classifier
Overall accuracy of all classifiers Using Count Vectorizer 1-g 2-g 3-g 91 66 62 62 60 53 87 65 64 86 59 62 79 67 52 81 61 41 85 67 55 80 53 50 91 62 47 90 57 58 78 56 51
Using TF-IDF 1-g 2-g 3-g 92 64 63 64 62 54 81 64 61 86 61 65 81 69 53 82 59 42 85 69 57 91 53 52 89 64 49 90 58 56 82 57 53
Fig.3 Overall Accuracy of classifiers Conclusion and Future Work People are using social media more and more every day. People prefer to post their opinions on social media instead of sharing them directly with others. Based on Twitter tweets, we determined the aggregate response of the public to the government's implementation of the NEET Exam. In this section, tweets are gathered about NEET Exam. We applied eleven machine learning algorithms to the collected information after annotation and pre-processing. Support Vector Machine using 1gram with both count vectorizer and TF-IDF has shown the best performance. This combination yields 92% accuracy, which is the best performance among all the classifiers with 2-gram and 3gram. In future, tweets collected are analysed based on hourly basis, weekly basis and monthly basis and we can find that people in which area supported NEET Exam from which time.
Advances in Science and Technology Vol. 124
353
References [1]
David Zimbra, Ahmed Abbasi, Daniel Zeng, and Hsinchun Chen, The State-of-the-Art in Twitter Sentiment Analysis: A Review and Benchmark Evaluation, ACM Trans. Manage. Inf. Syst, Vol 9, pp.1-29. (2018)
[2]
Aarzoo Dhiman, Durga Toshniwal, A Twiter Framework to assess the Efectiveness of Indian Government Campaign, ACM Trans, pp.2375-4699. (2021)
[3]
Nadia Felix F. da Silva, Luiz F. S. Coletta, and Eduardo R. Hruschka., A survey and comparative study of tweet sentiment analysis via semi-supervised learning, ACM Comput. Surv, Vol. 49, pp.1-26. (2016)
[4]
Ashima Yadav and Dinesh Kumar Vishwakarma, A Language- independent Network to Analyze the Impact of COVID-19 on the World via Sentiment Analysis, ACM Trans. Internet Technol, Vol.22, pp. 1-30. (2021)
[5]
Devendra K. Tayal, Sumit K. Yadav, Sentiment analysis on social campaign Swachh Bharat Abhiyan using unigram method, Vol. 32,pp. 633-645. (2017)
[6]
Batrinca, B., Treleaven, P.C, Social media analytics: a survey of techniques, tools and platforms, AI & Soc, Vol. 30, pp. 89–116. (2015)
[7]
Rocío B. Hubert, Elsa Estevez, Ana Maguitman, and Tomasz Janowski, Analyzing and Visualizing Government-Citizen Interactions on Twitter to Support Public Policy-making, Digit. Gov.: Res. Pract. Vol. 1, pp. 1-20. (2020)
[8]
Anastasia Giachanou and Fabio Crestani, Like it or not: A survey of Twitter sentiment analysis methods, ACM Comput. Surv. Vol. 49, pp. 1-41. (2016)
[9]
Paltoglou, G. and Thelwall, M, Twitter, MySpace, Digg: Unsupervised sentiment analysis in social media, ACM Trans. Intell. Syst. Technol, Vol.3, pp. 1-19. (2012)
[10] Jose N. Franco-Riquelme, Antonio Bello-Garcia, Joaquín Ordieres- Meré , Political Support for the Electoral Process on Twitter: The Case of Spain’s 2015 and 2016 General Elections, IEEE Access,Vol.7, pp. 62545 – 62560. (2019) [11] Itisha Gupta, Nisheeth Joshi, Feature-Based Twitter Sentiment Analysis With Improved Negation Handling, IEEE Transactions on Computational Social Systems, vol. 8, pp. 917927. (2021) [12] Bibi Amina, Tayyaba Azim, SCANCPECLENS: A Framework for Automatic Lexicon Generation and Sentiment Analysis of Micro Blogging Data on China Pakistan Economic Corridor, IEEE Access, Vol. 7, pp. 133876 – 133887.( 2019) [13] Analysis of Public Sentiment on COVID-19 Vaccination Using Twitter, IEEE Transactions on Computational Social Systems , pp. 1 – 11. (2021) [14] Z. Jianqiang and G. Xiaolin, Comparison research on text preprocessing methods on Twitter sentiment analysis, IEEE Access, vol. 5, pp. 2870–2879. (2017) [15] S. E. Saad and J. Yang, Twitter sentiment analysis based on ordinal regression, IEEE Access, vol. 7, pp. 163677–163685. (2019) [16] S. Bhat, S. Garg, and G. Poornalatha, Assigning sentiment score for Twitter tweets, in Proc. Int. Conf. Adv. Comput., Commun. Informat. (ICACCI), pp. 934–937. (2018) [17] L. Nemes and A. Kiss, Social media sentiment analysis based on COVID-19, J. Inf. Telecommun., vol. 5, no. 1, pp. 1–15.( 2021)
354
IoT, Cloud and Data Science
[18] U. Naseem, I. Razzak, M. Khushi, P. W. Eklund, and J. Kim, COVIDSenti: A large-scale benchmark Twitter data set for COVID- 19 sentiment analysis, IEEE Trans. Comput. Soc. Syst., vol. no. , pp. – ( 2021) [19] H. Lyu et al., Social media study of public opinions on potential COVID-19 vaccines: Informing dissent, disparities, and dissemination, 2020, arXiv:2012.02165. [Online]. Available: http://arxiv. org/abs/2012.02165. [20] Z. Ren, G. Zeng, L. Chen, Q. Zhang, C. Zhang, and D. Pan, A lexicon-enhanced attention network for aspect-level sentiment analysis, IEEE Access, vol. 8, pp. 93464–93471. ( 2020) [21] L.-A. Cotfas, C. Delcea, I. Roxin, C. Ioanas, D. S. Gherai, and F. Tajariol, The longest month: Analyzing COVID-19 vaccination opinions dynamics from tweets in the month following the first vaccine announcement, IEEE Access, vol. 9, pp. 33203–33223. (2021) [22] K. Zvarevashe and O. O. Olugbara, A framework for sentiment analysis with opinion mining of hotel reviews, in Proc. Conf. Inf. Commun. Technol. Soc. (ICTAS), pp. 1–4. (2018)
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 355-361 doi:10.4028/p-39473k © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-04 Accepted: 2022-09-16 Online: 2023-02-27
Two-Step Text Recognition and Summarization of Scanned Documents Varun V.a, Steffina Muthukumarb Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus, No.1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India [email protected], [email protected]
a
Keywords: Automatic Text Summarization, Optical Character Recognition (OCR), TextRank Algorithm
Abstract - With the explosion of unstructured textual data circulating the digital space in present times, there has been an increase in the necessity of developing tools that can perform automatic text summarization to allow people to get insights from them easily and extract significant and essential data using Automatic Text Summarizers. The readability of documents can be improved and the time spent on researching for information can be improved by the implementation of text summarization tools. In this project, extractive summarization will be performed on text recognized from scanned documents via Optical Character Recognition (OCR), using the TextRank algorithm which is an unsupervised text summarization technique for performing extractive text summarization. I. INTRODUCTION The inestimable explosion of unstructured text data in this era of big data a variety of sources, a major part of the data needs to be summarized effectively to extract essential and significant data. This has led to the increase in the demand for extensive research in the field of Natural Language Processing and Automatic Text Summarization (ATS) which is the task of creating a brief and coherent summary without any human intervention while keeping the original meaning of the text document intact. Since manual summarization proves to be laborious and time expensive, the automatization of text summarization proves to be a strong motivator for academic research. There exist two major summarization approaches: Extractive summarization and Abstractive summarization. Extractive Summarization uses a scoring function to take the most relevant sentences from the text and arranges them to form a summary. The general idea behind the approach is to stitch together the most important sentences from the text while cropping out the less important ones. Abstractive Summarization, on the other hand, uses natural language processing techniques to interpret and form concise sentences, highlighting the critical information, some of which may not be a part of the original text. Various other machine learning models that have been proposed for this task, view this as a classification problem that decides whether to include a sentence in the summary or not. Other approaches include the use of techniques like topic information, Reinforcement Learning, and Latent Semantic Analysis (LSA. With the increase in dependence on information scanned through paper documents, the need for software systems that recognize characters from the scanned information has also increased. Paper documents, as we know, are perishable and with time, they might get damaged by changes in the atmosphere or from human error. Therefore, it is advisable to store the necessary data and information to a computer storage drive and reuse it later whenever the need arrives. This process of storing data or information from paper documents to computer storage can be easily carried out by the use of a technique called Optical Character Recognition (OCR). It is a technique that enables us to read or grab text from printed or scanned photos, and handwritten images, and convert them into a digitally editable and searchable format. Using OCR, electronic documents can be easily converted to readable text on which we can perform functions like editing and searching.
356
IoT, Cloud and Data Science
Through this paper, a new technique for performing extractive text summarization of scanned documents, based on the concepts of Optical Character Recognition and Automatic Text Summarization is proposed. The goal of this research was to make the text summarization process of documents easier by introducing a method or technique which can help perform summarization of text directly from the scanned documents and produce human-like summaries automatically. II.
RELATED WORKS
The proposed approach is in regards to extractive text summarization, and optical character recognition. The method and related work of how the proposed system is placed in the literature are shown. A. Extractive Text Summarization Extractive Text Summarization deals with stitching up the most relevant sentences from the text and based on a scoring function, arranging them to form a concise summary. Many methods and strategies have been developed to produce summaries automatically and make the process more efficient. Angel Hernández-Castañeda, René Arnulfo García-Hernández, Yulia Ledeneva and Christian Eduardo Millán-Hernández [1] proposed a new method for extractive text summarization which finds an optimal grouping of sentences based on genetic algorithm supported clustering scheme. Important sentences from clusters based on keywords that were generated automatically, were determined by using Latent Dirichlet Allocation (LDA) for topic modeling. Their experimental results showed that previous methods of generating summaries were outperformed by their system and the generated summaries include context by matching adjacent words and also matches of unique words. B. Optical Character Recognition Optical Character Recognition can be understood as the process of identifying or converting text present in images of typed, handwritten, or printed text into machine-encoded text which can be later used as editable text. How it works is, the text present in the electronic document is converted to a relative ASCII character and uses a database to recognize characters in case of handwritten documents. Anshul Arora, Rajat Singh, Ashiq Eqbal, Ankit Mangal, and Prof. S. U. Souji [2] examined and discussed various methods of finding text characters from scene frames. They have also reviewed and explained the basics of text recognition from images, and the various image processing techniques. Jinyuan Zhao, Yanna Wang, Baihua Xiao, Cunzhao Shi, Fuxi Jia, and Chunheng Wang[5] introduced the Generative Adversarial Networks (GAN) framework to text detection tasks for document images and also put forth a good architecture for the same. They transformed the text detection task into an image generation task using the generative adversarial architecture. With the help of this, the texts are highlighted and the printing direction of each character is shown using the generated corresponding directional text score map when a camera-captured document is given as an input. The last step of this process is to extract the characters using connected component analysis. They reached the conclusion that this method outperformed various other traditional OCR methods and improved subsequent text detection. III. Proposed Methodology In this project, a new system is proposed with the help of which the user can directly scan a document and then have the text identified from the document summarized with the help of the
Advances in Science and Technology Vol. 124
357
TextRank algorithm and optical character recognition. The proposed system is described in detail in the following sections. The basic architecture will be described in Section 3.1 and all the individual modules will be described later in the subsequent sections. A. System Architecture The proposed methodology is to take a pdf or a scanned document as an input and then convert it into an image on which optical character recognition will be performed to identify and extract the text. This extracted text is then fed into the TextRank Algorithm with the help of which the most relevant sentences out of the whole text are identified which form the summary.
Fig 3.1 System Architecture B. Conversion of PDF to Image The proposed system takes an input as a pdf which then has to be converted into an image so that the text can be identified from it through the process of Optical Character Recognition. This conversion of a document in PDF format into an image is done with the help of a python library known as pdf2image and the pdf rendering library called poppler.
Fig 3.2 PDF to Image conversion pdf2image is a popular python library that converts the pdf into a PIL object. PIL is the Python Imaging Library which is an image processing package for the python language that helps in image processing tasks like editing, creating, and saving images. Poppler is a python binding to the poppler-cpp library which allows us to read, render, or modify PDF documents. When converting a multipage pdf document into images, each page of the PDF is stored as an independent image file as: PDF page 1 -> page_1.jpg, PDF page 2 -> page_2.jpg, and so on. C. Optical Character Recognition After the conversion of the pdf into images, the next step in the proposed methodology is to perform optical character recognition (OCR) on the image and identify and extract text from it. This is done with the help of a python library called PyTesseract. PyTesseract is a python library that enables us to perform OCR on an image and read the text “embedded” in it.
Fig. 3.3 Optical Character Recognition
358
IoT, Cloud and Data Science
D. The TextRank Algorithm After the text is extracted by performing OCR on the image, the next step is to use this text and feed it into the TextRank Algorithm for summarization. The TextRank Algorithm is a graph-based ranking algorithm used to measure the relationship between two or more words. It is an unsupervised technique used for extractive text summarization.
Fig. 3.4 TextRank Algorithm How the TextRank algorithm works is, the whole text is analysed and split into sentences that undergo vectorization using word embeddings. A matrix is then formed in which the calculated similarities of sentence vectors are stored. This matrix is then made into a graph for sentence rank calculation which helps to identify the top-ranked sentences which form the final summary. 1. Vector Representation of Sentences Vector representation of sentences can be thought of as learned representation for text where similar words or words with similar meanings have similar representation. In this project, GloVe Embeddings have been used for the vector representation of sentences. a) GloVe Word Embeddings GloVe (Global Vectors for word representation) is an unsupervised learning algorithm created by the researchers at Stanford to obtain vector representations for words. The general idea behind the GloVe word embeddings is to find the association between the words from statistics. The GloVe word embeddings have been used in this approach because they take care of the order in which the words occur. b) Similarity Matrix A n x n matrix is created based on the calculated Cosine similarity between each sentence pair, where n equals the number of sentences. Cosine Similarity or the Cosine distance between any two vectors in a multi-dimensional is the cosine of the angle between them. Cosine Distance (Va, Vb) = 1- Cosine (Angle between Va, Vb) Where Va and Vb are two vectors. Vectors are said to be similar if the cosine distance is low and the cosine similarity high. c) Creating a Graph The similarity matrix is then made into a graph with the nodes representing the sentences and the edges representing the similarity scores between them. d) Ranking the sentences As the last step, the sentences are ranked based on the similarity scores and return the top N number of sentences to be included in the summary.
Advances in Science and Technology Vol. 124
IV.
OUTPUT
Fig 4.1 Input PDF
Fig 4.2 Converted Image
359
360
IoT, Cloud and Data Science
Fig. 4.4 The summary
Fig 4.3 Extracted Text V.
RESULTS
Through the proposed methodology, text summarization was successfully performed on a scanned input document (PDF). The PDF document was first converted into an image from which. Text was extracted using Optical Character Recognition. This text was then used as an input for the TextRank Algorithm which gave the extractive summary.
Advances in Science and Technology Vol. 124
VI.
361
CONCLUSION AND FUTURE SCOPE
In this paper, a new two-step scanning and summarization of scanned documents technique was introduced which employed the use of Optical Character Recognition and TextRank algorithm. The first step in this two-step process is the conversion of the scanned document (pdf) into an image and the subsequent extraction of text from the image using OCR. The second step is to use the text and perform extractive summarization on it using the TextRank algorithm with the help of GloVe word embeddings. This paper illustrates the use of the PyTesseract and the pdf2image libraries of python to convert a pdf input into an image and then subsequent extraction of text from it to perform extractive text summarization on the input scanned document. The use of GloVe word embeddings in the process of vectorization of sentences as a part of the TextRank algorithm has also been shown. This Project can be further extended by creating a web tool using Flask or other web development frameworks, or an mobile application using the latest mobile app development tools for iOS and Android. References [1]
ÁNGEL HERNANDEZ-CASTANEDA, RENE ARNULFO GARCIA-HERNANDEZ, YULIA LEDENEVA, CHRISTIAN EDUARDO MILLAN- HERNANDEZ, "Languageindependent extractive automatic text summarization based on automatic keyword extraction" In Computer Speech & Language, Volume 71, January 2022, 101267, Elsevier
[2]
ANSHUL ARORA, RAJAT SINGH, ASHIQ EQBAL, ANKIT MANGAL, PROF. S. U. SOUJI, “Extraction and Detection of Text From Images”, In International Journal of Research in Engineering and Technology Vol. 8, August 2021
[3]
MINGXI ZHANG, XUEMIN LI, SHUIBO YUE, AND LIUQIAN YANG, “An Empirical Study of TextRank for Keyword Extraction”, In IEEE Access(2020)
[4]
M. F. MRIDHA, AKLIMA AKTER LIMA, KAMRUDDIN NUR, SUJOY CHANDRA DAS, MAHMUD HASAN, AND MUHAMMAD MOHSIN KABIR, “A survey of Automatic Text Summarization: Progress, Process and Challenges”, In IEEE Access November 22, 2021
[5]
JINYUAN ZHAO, YANNA WANG, BAIHUA XIAO, CUNZHAO SHI, FUXI JIA, AND CHUNHENG WANG, “DetectGAN: GAN-based text detector for camera-captured document Images” In International Journal on Document Analysis and Recognition (IJDAR), Springer 2020
[6]
JINGQIANG CHEN, HAI ZHUGE, “Extractive Text-Image Summarization using MultiModal RNN”, In 14th International Conference on Semantics, Knowledge, and Grids (SKG) IEEE (2018)
[7]
ASH RANI MISHRA, V.K PANCHAL, PAWAN KUMAR, “Extractive Text Summarization - An effective approach to extract information from Text” In 2019 International Conference on Contemporary Computing and Informatics (IC3I) IEEE 2019
[8]
RAUNAK KOLLE, S SANJANA, MERIN MELEET,“Extractive Summarization of Text from Images” in International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES) 2021
[9]
POOJA RAUNDALE, HIMANSHU SHEKHAR, “Analytical Study of Text Summarization Techniques” In Asian Conference on Innovation in Technology (ASIANCON) 2021
[10] XIYAN LIU, GAOFENG MENG, CHUNHONG PAN, “Scene text detection and recognition with advances in deep learning: a survey” In International Journal on Document Analysis and Recognition (IJDAR), Springer 2019
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 362-369 doi:10.4028/p-hdm12o © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-01 Accepted: 2022-09-16 Online: 2023-02-27
Detecting Fake Job Posting Using ML Classifications & Ensemble Model Aadharsh K. Praveen1,a, Harsita R.2,b , Rachanna Deva Murali3,c and S. Niveditha4,d Department of Computer Science Engineering SRM Institute of Science and Technology Chennai, Vadapalani Campus, No.1 Jawahalal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India 1,2,3,4
[email protected], [email protected], [email protected], [email protected]
a
Keywords: Ensemble Model, Fraudulent Job
Abstract. In this project, we create a fraudulent checker tool to detect fake job postings using NLP (Natural Language Processing) and ML approaches (Random Forest Classifiers, Logistic Regression, Support Vector Machines, and XGBoost Classifiers). These approaches will be compared and then combined into an ensemble model which is used for our job detector. The aim is to predict using machine learning for real or fake job prediction results with the highest accuracy. Dataset analysis is performed by supervised machine learning techniques (SMLT) and collects a variety of information such as variable identification, missing value handling, and data validation analysis. Data cleaning and preparation along with visualization are performed on the entire dataset. The ensemble model is created at the end using ML Algorithms like XGBoost, SVM, Logistic Regression, and Random Forest Classifier by choosing 4 of the best contributing features. The model produced at the end will be implemented in a Flask application for demonstration. Introduction Many companies, whether well-known or new, prefer to post job openings online, either through advertisements or on a dedicated job recruitment platform. Since this method is the fastest and can easily reach a wide variety of capable applicants. But because of this very reason, plenty of scams also arise. Identifying a fake job application depends on many factors. Since anyone can view these job postings, freshers who are gullible might fall for these scams easily. Work-from-home jobs have long been a target of fraudsters, with a spike of 300% in hiring scams before 2017, and another increase in the number of frauds until 2020. However, in the wake of the COVID-19 crisis, they have suddenly become even more vulnerable targets [7]. Unfortunately, many people have lost their jobs due to the coronavirus pandemic Scammers are incredibly tuned into the fact that some job seekers are desperate to make money. From scammers to copy-paste connoisseurs – some of them even play a role in giving “good” recruiters a poor reputation. A tell-tale sign is the lack of verifiable information. You may have believed you landed your perfect job, but after closer inspection, you can't locate any information about the organization. If you cannot confirm contact information, location, a website, or employees, then you may have fallen prey to fraudulent recruitment. Therefore, to prevent such scams, the Machine Learning approach is used to classify whether the selected job application is real or fake. Using our ensemble model, we can consider several factors. The primary factor is the description of the job. The other factors include company profile, benefits, requirements, etc. With these factors, we can decipher whether the job posting is real or fake. Related Work Here we are analyzing the related work that is similar for our project. In the below Table 1, the drawback of each classifier is mentioned and the reason why the said classifier is not being used.
Advances in Science and Technology Vol. 124
363
Table 1. Performance Comparison Classifiers Naïve Bayes Classifier [9] AdaBoost Classifier [10] K-Nearest Neighbors Classifier [11] Multi-Layer Perceptron Classifier [12] Decision Tree Classifier [13]
Characteristics Metrics Drawbacks Assumes all predictors are F1 score – 0.72 independent, rarely happening MSE – 0.52 in real life. F1 score – 0.98 Noisy data and outliers must be MSE – 0.03 avoided F1 score – 0.96 Require high memory MSE – 0.04 F1 score – 0.96 Requires tuning hyperparameters MSE – 0.05 F1 score – 0.97 A small change in the data can cause MSE – 0.03 a large change in the structure
Naïve Bayes Classifier. Naïve Bayes is a technique to construct classifiers. There is no single algorithm for training such classifiers, but rather a collection of algorithms. Naïve Bayes Classifier assumes all values of one feature are independent of values of other features. For example, each feature is considered to contribute independently to the probability that the vegetable is a carrot, regardless of any possible correlations between the color, shape, and height features. AdaBoost Classifier. AdaBoost is a boosting technique. An AdaBoost classifier is an ensemble model that first fits a classifier to the dataset, then iteratively fits the same classifier to the dataset, but changes the weight of erroneously classified instances dramatically such that the successive classifiers focus on complex cases. K-Nearest Neighbor Classifier. K-Nearest Neighbor is a Machine Learning algorithm that is based on the Supervised Machine Learning technique. KNN stores all data and compares it with the new data and finds the similarity between them. The K-NN technique assumes that the input data and cases from the dataset are similar and thus place the new case in the category that is closest to the existing categories. KNN needs a massive amount of memory to store all the data. Multilayer Perceptron Classifier. MLP is a class of ANN (Artificial neural network). MLP has at least 3 basic layers, Input, Output, and hidden layer. It uses a supervised learning technique [6] named backpropagation for training the data. It has multiple layers and non-linear activation, so it is different from linear perceptrons. MLP contains many perceptrons that are arranged like layers. Perceptrons are like a special scenario of artificial neurons that use a certain threshold activation function. MLP was popular in finding applications like speech and image recognition, machine translation, etc. Decision Tree Classifier. The decision tree has nodes that specify an attribute, and each branch denotes the one in many values for that attribute. Leaf represents the class labels. The algorithm is that the classification starts at the root node and each node splits into two or more subtrees according to a condition, at the end a new node is created. This process carries on till all data is classified. It works in a top-down manner. There are lots of possibilities to measure the split of the subtrees. Methodology Our Objective is to build a robust model that can withstand noisy data, the performance should not drop if inputs are missing in the dataset and during deployment. So to build such a robust model, we are creating an ensemble model using 4 Machine Learning Algorithms - SVM [4], XGBoost[5], Random Forest Classifier[2], Logistic Regression[3]. Fig 1 shows the architecture diagram of our ensemble model.
364
IoT, Cloud and Data Science
Fig 1. Architecture Diagram Random Forest Classifier. RFC is an ensemble model, which is a collection of many decision trees. With the data collected, the decision tree gives output and with the output, we get from the decision tree, we create another decision tree for the outputs we get to get the accurate output while also considering many factors at the same time. The logic behind this is that a combination of many mediocre models is better than one good model. The reason we are using this model is that it works well with the unbalanced dataset. Logistic Regression. The reason to use LR is that it works well with binary outcomes. LR is mainly used when the target is categorical. In our project, we just must mention whether the job posting is fake or not, which falls into only two categories. It is a process of modeling the probability of a certain outcome from the input. Support Machine Vector. SVM operates on an n-dimensional space. The data items are plotted in any nth dimension where n is the number of features we have. So, the more features we have the more dimensions are available to work with. Then we perform the classification by finding the hyperplane that separates these two classes. The main advantage is that it can create a new dimension for every feature. So, the greater the number of features, it doesn't matter if the samples are low or missing. The reason we are implementing this classifier is that it can still predict the output with some missing samples as it was mentioned before. XGBoost. XGBoost is an optimized gradient boosting library that is efficient, flexible, and portable. Many ML algorithms can be implemented from the gradient boosting framework. The reason to use XGBoost is that it has a faster execution speed. The main reason to include this model in our ensemble model is that while training it can still work with missing features. The Dataset In preprocessing we clean the dataset by replacing empty strings with “Unspecified” or null, balancing the dataset, etc. It is not possible to work with any models if the data is not cleaned. So, we apply changes to the data set and keep it ready for EDA and finding correlation. Dataset. The dataset contains 17880 entries with 18 features. Fig 2 shows the list of features. Fig 3 shows the distribution of features for fraudulent jobs.
Advances in Science and Technology Vol. 124
Fig 2
365
Fig 3
Fig 2, 3. Dataset Features and Distribution of Features for Fraudulent jobs Data Cleaning. The features fall into 4 parts, Binary, Category, Text, and Complex. Data cleaning is carried out by pre-processing text for text features like description, requirements, company profile, and benefits. The cleaning for other features is carried out by replacing the null values with the string “Unspecified”. Pre-Processing the Text. This is where we will be using NLP [1]. The data is cleaned by removing punctuations, stop words, and numbers before modeling as they don't give much information about the target. For every text that exists in the dataset, we remove stop words. Stop words contain articles like an, a, the. It removes unimportant words and keeps the keywords for analyzing phase. We create another column named “company profile specified”. This has a Boolean value and tells whether there is a company profile or not. Likewise, we create column names as “description specified”, “requirements specified”, and “benefits specified”. The data shows that only one fraudulent data exists without a description. Therefore, it is safe to assume that ‘Description’ is important. For the text features, we find the mean count of words in fraudulent and non-fraudulent posts and find the difference between them. Then we find the overall word count for the company profile and requirements. Correlation. Every feature is compared with the fraudulent output to determine the correlation between them.[14]
Fig. 4. Distribution of features for fraudulent jobs
366
IoT, Cloud and Data Science
From Fig 4, we can say that the company profile, requirements, benefits, and job description are important. Comparison. It is vital to consistently assess the effectiveness of various machine learning algorithms. The performance attributes of every model will differ. The following algorithms were compared with the 4 features that are compatible with the output - Random Forest Classifier, Logistic regression, SVM, and XGBoost. Job Description, Job Requirements, Job Benefits, and Company Profile and these features are first, individually tested with the Random Forest Classifier, Logistic Regression, Support Vector Machines, and XGBoost models. These features were chosen because they distinguish genuine job posts from fraudulent ones. The individual performances of each model against each feature were determined by computing their F1 Scores from the Precision and Recall metrics of each feature for their respective models [8]. Combining All Features. Accuracy and f1-score of all models with combined data for which fraudulent is false as follows: Table 2. Model Comparison Performance Metrics Precision Recall F1 - Score Accuracy
Models Random Forest Logistic Support Vector Classifier Regression Machine 0.98 0.94 0.97 0.61 0.76 98%
0.73 0.82 98.3%
0.70 0.81 98.3%
XGBoost 0.98 0.66 0.79 98.2%
Ensemble Model We are creating an ensemble model with the data we collected so far. From the dataset, four distinguishable features are taken. To obtain a model that can make predictions with the highest accuracy, the four models tested are integrated to create our very own Ensemble model. This Ensemble model is made from the combination of the Random Forest Classifier, Logistic Regression, SVM, and XGBoost is tested against the four selected features to identify if the job posted is real or fake. The performance of this model is decided by the F1 Score which is calculated by combining the latest Precision and Recall measurements of the model. Tuning the Model. Now that the model is created, we must tune it. Tuning is the process of assigning weights to each model. So that the result of each model is given separate importance based on their accuracy. This way, we can get the advantages of each model and it will not pull down the performance of the ensemble model. Table 3. Performance Before Tuning Performance Metrics Precision Recall F1 - Score Accuracy
Models Custom Ensemble Model 0.97 0.72 0.82 98.4%
The Model is tuned using a custom brute force function which finds the best weight allocation for each model and the threshold.
Advances in Science and Technology Vol. 124
367
Table 4. Performance After Tuning Performance Metrics Precision Recall F1 - Score Accuracy
Models Custom Ensemble Model 0.93 0.79 0.86 98.6%
Performance of the Final Model. The final performance of the ensemble model after tuning is shown in Fig 5. The confusion matrix is also shown in Fig 6.
0 - Real Job 1 – Fraudulent Job
Fig 5. Final Performance
0 - Real Job 1 – Fraudulent Job
Fig 6. Confusion Matrix Deployment
The deployment of the project is carried out in a flask, with the post and get request. The styling and modeling of the web page are done using HTML and CSS. The user enters the job information and submits, they get redirected to other pages that tell them whether the job posting is fake or not.
368
IoT, Cloud and Data Science
Conclusion & Future Scope The analytical process started with data cleaning and processing, missing value, exploratory data analysis, and finally model building by combining 4 Machine Learning Algorithms - SVM, XGBoost, Logistic Regression, and Random Forest Classifier with an accuracy of 98.6% and F1 score of 0.99 and 0.85 for non-fraudulent and fraudulent respectively. This application can help to find the Prediction of Real and fake jobs. This project can be further enhanced in the future by hosting the website in cloud and make it so that it redirects to the source of the job posting if the job posting is real. References [1] de Oliveira, Nicollas R., Pedro S. Pisa, Martin A. Lopez, Dianne S.V. de Medeiros, and Diogo M.F. Mattos. 2021. "Identifying Fake News on Social Networks Based on Natural Language Processing: Trends and Challenges" Information 12, no. 1: 38. Doi: /10.3390/info12010038 [2] Nasiba Mahdi Abdulkareem & Adnan Mohsin Abdulazeez, 2021. "Machine Learning Classification Based on Radom Forest Algorithm: A Review," International Journal of Science and Business, IJSAB International, vol. 5(2), pages 128-142. RePEc:aif: journal: v:5:y:2021:i:2:p:128142 [3] Saleh Hussein, Ameer, Rihab Salah Khairy, Shaima Miqdad Mohamed Najeeb, and Haider Th.Salim Alrikabi. 2021. “Credit Card Fraud Detection Using Fuzzy Rough Nearest Neighbor and Sequential Minimal Optimization With Logistic Regression”. International Journal of Interactive Mobile Technologies(iJIM)15(05):pp.24-42. Doi: 10.3991/ijim.v15i05.17173. [4] Yu, Yinshan, Mingzhen Shao, Lingjie Jiang, Yongbin Ke, Dandan Wei, Dongyang Zhang, Mingxin Jiang, and Yudong Yang. "Quantitative analysis of multiple components based on support vector machine (SVM)." Optik 237 (2021): 166759. Doi:10.1016/j.ijleo.2021.166759 [5] Giannakas, Filippos, Christos Troussas, Akrivi Krouska, Cleo Sgouropoulou, and Ioannis Voyiatzis. "XGBoost and deep neural network comparison: The case of teams’ performance." In International Conference on Intelligent Tutoring Systems, pp. 343-349. Springer, Cham, 2021. Doi: 10.1007/978-3-030-80421-3_37 [6] Sarker, Iqbal H. "Machine learning: Algorithms, real-world applications, and research directions." SN Computer Science 2, no. 3 (2021): 1-21. Doi: 10.1007/s42979-021-00592-x [7] Gozum, Ivan Efreaim A., Harvey Gain M. Capulong, Joseph Renus F. Galang, and Jose Ma W. Gopez. "An ayuda to the least advantaged: providing a program for those who were hit the hardest during the COVID-19 pandemic." Journal of Public Health 43, no. 2 (2021): e317-e318. Doi: 10.1093/pubmed/fdab014 [8] Powers, David & Ailab,. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2. 2229-3981. 10.9735/22293981. [9] H. Hairani, A. Anggrawan, A. I. Wathan, K. A. Latif, K. Marzuki and M. Zulfikri, "The Abstract of Thesis Classifier by Using Naive Bayes Method," 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), 2021, pp. 312-315, Doi: 10.1109/ICSECS52883.2021.00063. [10] Yanfeng Zhang and Peikun He, "A revised AdaBoost algorithm: FM-AdaBoost," 2010 International Conference on Computer Application and System Modeling (ICCASM 2010), 2010, pp. V11-277-V11-281, Doi: 10.1109/ICCASM.2010.5623209.
Advances in Science and Technology Vol. 124
369
[11] X. Yu and X. yu, "The Research on an Adaptive k-Nearest Neighbors Classifier," 2006 5th IEEE International Conference on Cognitive Informatics, 2006, pp. 535-540, Doi: 10.1109/COGINF.2006.365542. [12] C. Jun, Z. Fan, and F. Shan, "Building up multi-layered perceptrons as classifier system for decision support," in Journal of Systems Engineering and Electronics, vol. 6, no. 2, pp. 32-39, June 1995. [13] M. Wozniak, "Experiments with Boosted Decision Tree Classifiers," 2008 Eighth International Conference on Intelligent Systems Design and Applications, 2008, pp. 552-557, Doi: 10.1109/ISDA.2008.215. [14] S. Yamaki, S. Seki, N. Sugita and M. Yoshizawa, "Performance Evaluation of Cross Correlation Functions Based on Correlation Filters," 2021 20th International Symposium on Communications and Information Technologies (ISCIT), 2021, pp. 145-149, doi: 10.1109/ISCIT52804.2021.9590596.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 370-377 doi:10.4028/p-atr6jg © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-12 Accepted: 2022-09-16 Online: 2023-02-27
Artificial Intelligence Based Chatbot for Healthcare Applications Nimal Kumarr.A.1,a, Vaishakh.V.Nair2,b*, Jegdeep.R.3,c, J. Arun Nehru4,d Department of Computer Science & Engineering, SRM-IST, Vadapalani, CHENNAI, India
1
[email protected],
a
[email protected], [email protected], [email protected]
b* d
Keywords-Chatbot, AI, LSTM, TECHNOLOGY IN HEALTHCARE, NLP
Abstract: During the epidemic, managing the flow of a large number of patients for consultation has been a tough game for hospitals or healthcare workers. It is becoming more difficult to contact a doctor considering the recent situation, especially in rural areas. It's obvious that well- designed and operated chatbots may actually be helpful for patients by advocating precautionary measures and cures, as well as taken to prevent harm inflicted by worry. This paper describes the development of a complicated computer science (AI) chatbot for advising prompt actions when they need to see a doctor. Moreover, offering a virtual assistant may suggest which sort of doctor to consult. 1. Introduction Virtual chatbots are the most recent digital innovations after the growth of mobile apps and the web application [1,2]. Conversational bots powered by artificial intelligence or computer programming with the assistance of natural language processing can help us to interact with machines and users. (NLP) [3]. Bots are especially promising and are considered as the technology which holds the key for human- machine hybrid interaction [1,4]. In the future, They are likely to be utilised in the most significant sectors around the world.The majority of hospital officials' time is spent scheduling appointments and answering patients' routine queries as they work in the medical industry, which is related to human interactions(Figure 1.1 Workflow diagram) [5]. In such cases, the use of bot applications is an easy and effective option. Collecting user feedback is another way to maintain good patient flow by performing repetitive and boring tasks. As a result of serious pandemics such as novel Coronaviruses, It is too unsafe nowadays for people to visit a nearby hospital to consult the doctor in case of some emergency. As an adjunct to local clinical practice or urgent drugs, these health bots are effective. We created an Intelligent healthcare chatbot that can determine the severity of a condition to relieve the pressure on healthcare institutions. It also contains all of the relevant information and precautionary actions.with a detailed video walkthrough to help the patients with their health issues. A mobile app called Aarogya Setu was recently developed in India to raise awareness of nCOV-19 by combining it with a chatbot [10]. In addition to acting as a medical consultant, we propose a personal health chatbot that can provide simple and relevant measures of every basic illness. The bot also offers 24/7 accessibility and provides a more human-like assessment of the patient's condition.
Figure 1.1 Workflow diagram
Advances in Science and Technology Vol. 124
371
2. Literature Review Artificial Intelligence Literature Review Unlike rule-based models, Machine Learning approaches enable chatbot AI models to be learned from a database of casual conversations. They must be trained using Machine Learning methods, which will attempt to build the machine using a training dataset.Machine Learning approaches have eliminated the need to physically develop and try an innovative style identifying algorithms, making chatbots more versatile and less dependent on domain expertise. The IR models (Information Retrieval-based models) and Generative models are the two types of AI models, as previously mentioned. IR Designs are created in such a way that, in a given set of text data, they may be used to find relevant information. The algorithm is going to be able to retrieve the data required. The algorithm employed is typically a Shallow Learning algorithm; however, Rule-based and Deep Learning algorithms also are utilised in Information Retrieval models. The chatbot analyses the user inquiry and chooses one amongst the replies accessible in its set supporting this input. The IR-based versions are pre-programmed with a variety of probable responses.This database is employed to form a conversation index, which displays all of the possible responses depending upon that information they were given. Whenever a person gives a chatbot an input, the chatbot handles it as if it were an unique individual. This input which is in the form of question, and an IR design similar to what utilised in online enquiries is mainly compare the data from the client to ones that are comparable within the index of chat rooms. The solution, along with a question from conversation index [12,33] As a result, the user's output is returned to them. The key benefit of this paradigm would be that this really guarantees the highest level of quality of answers because they isn't produced directly. It will be expensive, time-consuming, and laborious to create the necessary cognitive content, which is one in all the foremost drawbacks of this strategy. Furthermore, because the large amount of knowledge available allows for a larger training set and knowledge domain, it's likely that everything will be available. The more difficult it is to link a data from the user to the correct response, the more Effort and cost must be spent training the program to choose the right answer [29,33] The basic algorithms for social or jabber agents—the so-called socializing chatbots—is probably less suitable for usage because they do not generate responses instead it extract replies from an already defined list in its cognitive content. On the other hand, IR design is less suited to form an identity, which could be a vital feature for these chatbots.However, some advancement has been achieved in recent years in developing new IR methods, and it's worth mentioning what Machine Learning has achieved. Machine Learning techniques are currently the underpinning tech in this type of design. [31,33] suggested a novel approach for expressing localized cooccurrence in the text and mapping stratified data from several domains for contextually unique words.[29] proposes an innovative improvement that tries to require prior turns within the discussion under consideration, thereby acquiring more contextual information so as to extend the standard and veracity of the output. A Deep Learning Model improves the knowledge retrieval process in this model by rating not just comment pairs that match the last input from the user, but also all ques pairs that match reformed version’s of past discussion turn. After that, the rating lists for the various revisions are combined. Contextual information from the user's previous enquiries is frequently used in this fashion, this bits of data could be utilized in order to derive a better answer from the mental object [29,33]. Designs that are generated because the name implies, generative- based models produce fresh data. Depending on the input of the users, word-by-word answers, as a result, these models are capable of construct whole new sentences in response to the questions of users; nonetheless, they have to be trained in order to a mass syntax and grammar, although the results are also lacking in quality. Consistency or quality. Typically, Conceptual models are trained on a massive database of real- life terms. based on a conversation The model can learn organisation, grammar, and vocab by repetition. The overall purpose of the algorithm is to be able to develop and offer a suitable, linguistically correct response based on the information that has been provided to it. This method is successful. A Deep Learning Algorithm is usually used. To counteract the vanilla's diminishing gradient impact.The Encoder-
372
IoT, Cloud and Data Science
DecoderNN design with Memory processes is used in Recurrent Neural Networks [17]. Algorithms that are used in the business. Among AI models, These models are composed of two RNNs, an encoder, and a decoder. Using the chatbot user's input sentence as input, One word at a time is processed by the encoder in the rnn's exact concealed state [33]. The ultimate condition of the sequence is the context vector, which represents its intent. The Decoder uses the vector representation as input and replaces one word at a time.The goal of this statistical method is to identify bent and provide the most likely answer considering the prescribed way.This paradigm does provide some unique benefits.First, rather than domain-specific expertise. Furthermore, while the model doesn't require domain-specific information to supply useful findings, it'll be modified to work with different methods if domain-specific knowledge is required for further research. The accepted term for dialogue production appears to be the Sequence-to-Sequence paradigm and plenty of NLP jobs in the recent past for these reasons. It does, however, have a serious limitation: the input sentence must contain all of the info. As a result, the longer the statement becomes, the more information is lost. As a result, Sequence to Sequence models struggle to reply to longer sentences and sometimes provide ambiguous responses. Furthermore, these models have a tendency to concentrate on a particular response when generating an answer, resulting in an absence of coherence in conversation turns [2,19,20]. Transformers. Of course, one in every of the foremost intriguing developments in Deep Learning language models has been the introduction of Transformers, which was first given by [21] within the paper "All you would like Is Attention." Transformers are language models that are entirely obsessed with the attention process. Transformers, rather than RNN designs like LST, are now the model of choice for NLP difficulties (LSTM) by weighting the importance of each piece of supplied data differently. Furthermore, they enable training on bigger datasets than was previously possible thanks to training parallelization. The Reformer [23] and also the Transformer XL [24] are two different Transformer models that have since been released. Each transformer version was created to handle specific issues for the task at hand. Despite the actual fact that transformers were designed to handle AI issues, they'll be modified and customised. The authors of propose the Transformer XL, an enhanced version of the Transformer. Using sentence-level recurrence, this model can transcend the Transformer's context constraints of a set length. Within the context of modelling of language, transformers have the power to be told longer-term dependencies, but they're bound by a finite length context. The authors offer Transformer-XL, a extremely unique neural design that allows learning reliance beyond a selected length without compromising coherence in time. It includes a recurrence mechanism at the segment level additionally as a singular positional encoding technique. this method seeks to capture longer- term dependencies while also addressing the matter of context fragmentation. Despite the indisputable fact Despite the fact that this method has yet to be used to dialogue modelling, it is unquestionably effective., It should be claimed that after the relevant and essential modifications are made, it will be. It could assist in resolving a number of challenges that existing conversation systems have, such as contextual recognition. As per the reformer mentioned by the author [23], a new transformer that is more efficient can employ more strategies to boost the its efficacy. In [25,33], the article introduces a chatbot design that's been built from the ground up using 40 billion phrases extracted and processed from ownership networking site conversations. The authors employ Meena to stretch the end-to-end approach's bounds, proving that a huge, reduced system can provide high-quality language outputs. The authors use the Evolved Transformer because of the fundamental architecture in an exceedingly seq2seq model[18,26]. The four major aspects in Evolutionary Transformer's architecture are I large complexity separating convolutions, (ii) Gated Linear Units[28], (iii) branched structures, and (iv) swish activations [29,33]. The Evolved Transformer's encoder and decoder each created a branched lower area with significant convolutions on their own. Because the Transformer is the same in both circumstances, the last section is nearly
Advances in Science and Technology Vol. 124
373
similar. The model has been trained on inter dialogues, in which the input phrase is made up of all of the context's turns. [32,33] Also provides a good human evaluation metric to quantify Meena's quality and compare it to other chatbots. 3.Methodology: 3.1. Deep learning and NLP Machine learning is based on computers, and its primary goal is to investigate texts or voice that correspond to a specified variety of hypotheses and techniques, such as grammatical and predictive analytics, by drawing rules and patterns utilising those methods. Within the age of deep neural networks, phrase or sentence integration has largely replaced word n-gram features; nonetheless, similarity collected using linguistics NLP as a feature in classification algorithms typically produces better results processing algorithms are employed during an awfully form of clinical and research applications, including trial screening, pharmacogenomics, diagnostic categorization, novel trait development, adverse drug events detection, and adverse drug event identification [31]. [32] lists a variety of communication processing systems that are developed for therapeutic usage. The proposed method is to form a Health Care Chabot using Natural Processing method. To create this dataset, we wish to understand what the intents of the visiting train are. The intention of a user interacting with a chatbot, or the purpose behind each information received by the chatbot from a specific user, is referred to as an intent(Fig 3.1.2 Imported Dataset )[35]. The aim is to design a number of different intents, create learning data for each of them, and then utilize that data to train our chatbot system.
Fig 3.1.1 Use case diagram
374
IoT, Cloud and Data Science
Fig 3.1.2 Imported Dataset 3.2. LSTM LSTMs are specifically developed to forestall the matter of long-term dependency. they're doing not should putting your all into to recollect knowledge for lengthy periods of time; it's like use to them. All RNN are composed of a series of repeated modules. This repeating module in ordinary RNNs will have a comparatively simple structure, like one than layer. The first stage in our LSTM is to decide exactly what sort information we're going to throw away from current block
Fig 3.2.1 Proposal System architecture diagram The model is build using the tactic of vectorization where the vectors made to grasp the knowledge. Tokenization method is used to interrupt the knowledge and to interpret the meaning of text. The built chatbot are designed in line with our use cases. What we've got described up to now can be a reasonably normal LSTM( Fig 3.2.1 Proposal System architecture diagram).
Advances in Science and Technology Vol. 124
375
Fig 3.2.3 Dataset layers used in our work
Fig 3.2.4 Accuracy and loss obtained for trained data
Fig 3.2.5 Accuracy graph and loss obtained for trained data 4.
Conclusions
The displayed Artificial Intelligence chatbot will have a huge impact on the lives of patients. It would give them the benefit of putting virtual medical professionals . We engage medical experts and health practitioners who provides data to this chatbots, as well as it provides each user with information when the risk of disease is recognised. The suggested bot is currently in the creative process, with the basic version coming
376
IoT, Cloud and Data Science
References [1]
Jadhav, K.P.; Thorat, S.A. Towards Designing Conversational Agent Systems. In Advances in Intelligent Systems and Computing; Springer: Berlin, Germany, 2020. [2] Battineni, G.; di Canio, M.; Chintalapudi, N.; Amenta, F.; Nittari, G. Development of physical training smartphone application to maintain fitness levels in seafarers. Int. Marit. Health 2019, 70, 180– 186. [CrossRef] [PubMed] [3] Yan, R. “Chitty-chitty-chat bot”: Deep learning for conversational AI. In Proceedings of the Twenty- Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018. [4] Luo, X.; Tong, S.; Fang, Z.; Qu, Z. Frontiers: Machines vs. humans: The impact of artificial intelligence chatbot disclosure on customer purchases. Mark. Sci. 2019. [CrossRef] [5] Chung, K.; Park, R.C. Chatbot-based healthcare service with a knowledge base for cloud computing. Cluster Compute. 2019, 22, 1925–1937. [CrossRef] [6] Sohrabi, C.; Alsafi, Z.; O’Neill, N.; Khan, M.; Kerwan, A.; Al-Jabir, A.; Iosifidis, C.; Agha, R. World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19). Int. J. Surg. 2020, 76, 71–76. [CrossRef] [PubMed] [7] WHO Health Alert Brings COVID-19 Facts to Billions Via WhatsApp.Available online: https://www.who.int/news-room/feature- stories/detail/who-health-alert-brings-covid-19-factsto-billions-via-whatsapp (accessed on 13 April 2020). [8] HowGovernments Worldwide are Using Messaging Apps in Times of COVID19’. Available online: https://www.messengerpeople.com/governments-worldwide-covid19/#Germany (accessed on 6 May 2020). [9] SAJIDA Foundation and Renata Ltd. Team up to Tackle the COVID-19 Pandemic|Dhaka Tribune’. Available online: https://www.dhakatribune.com/feature/2020/04/06/saj idafoundation-and-renata-ltd-team-up-totackle-the- covid-19-pandemic (accessed on 6 May 2020). [10] Aarogya Setu Mobile App|MyGov.in. Available online: https://www.mygov.in/aarogya-setuapp (accessed on 6 May 2020). [11] Sojasingarayar, A. Seq2Seq AI Chatbot with Attention Mechanism. Master’s Thesis, Department of Artifificial Intelligence, IA School/University-GEMA Group, BoulogneBillancourt, France, 2020. [12] Shum, H.y.; He, X.d.; Li, D. From Eliza to XiaoIce: Challenges and opportunities with social chatbots. Front. Inf. Technol. Electron. Eng. 2018, 19, 10–26. [CrossRef] [13] Yan, R.; Song, Y.; Wu, H. Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval-SIGIR ’16, Pisa, Italy, 17– 21July 2016; ACM Press: Pisa, Italy, 2016; pp. 55–64. [CrossRef] [14] Lu, Z.; Li, H. A Deep Architecture for Matching Short Texts. Adv. Neural Inf. Process. Syst. 2013, 26, 1367–1375. [15] Shang, L.; Lu, Z.; Li, H. Neural Responding Machine for Short-Text Conversation. arXiv 2015, arXiv:1503.02364. [16] Sordoni, A.; Galley, M.; Auli, M.; Brockett, C.; Ji, Y.; Mitchell, M.; Nie, J.Y.; Gao, J.; Dolan, B. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses. arXiv 2015, arXiv:1506.06714. [17] Vinyals, O.; Le, Q. A Neural Conversational Model. arXiv 2015, arXiv:1506.05869. [18] Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to Sequence Learning with Neural Networks. Adv. Neural Inf. Process. Syst. 2014, 2, 3104–3112. [19] Jurafsky, D.; Martin, J. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition; Dorling Kindersley Pvt, Limited: London, UK, 2020; Volume 2 [20] Strigér, A. End-to-End Trainable Chatbot for Restaurant Recommendations. Master’s Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2017.
Advances in Science and Technology Vol. 124
377
[21] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [22] Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [23] Kitaev, N.; Kaiser, L.; Levskaya, A. Reformer: The Effificient Transformer. arXiv 2020, arXiv:2001.04451. [24] Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. arXiv 2019, arXiv:1901.02860. [25] Adiwardana, D.; Luong, M.T.; So, D.R.; Hall, J.; Fiedel, N.; Thoppilan, R.; Yang, Z.; Kulshreshtha, A.; Nemade, G.; Lu, Y.; et al. Towards a Human-like Open-Domain Chatbot. arXiv 2020, arXiv:2001.09977. [26] Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2016, arXiv:1409.0473. [27] So, D.R.; Liang, C.; Le, Q.V. The Evolved Transformer. arXiv 2019, arXiv:1901.11117. [28] Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language Modeling with Gated Convolutional Networks. arXiv 2017, arXiv: cs.CL/1612.08083. [29] Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. arXiv 2017, arXiv: cs.NE/1710.05941D. Natural language processing technologies in radiology research and clinical applications. Radiographics 2016, 36, 176–191. [CrossRef] [PubMed] [30] Zeng, Z.; Deng, Y.; Li, X.; Naumann, T.; Luo, Y. Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM Trans. Comput. Biol. Bioinf. 2019, 16, 139–153. [CrossRef] [PubMed] [31] Kreimeyer, K.; Foster, M.; Pandey, A.; Arya, N.; Halford, G.; Jones, S.F.; Forshee, R.; Walderhaug, M.; Botsis, T. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review. J. Biomed. Inform. 2017, 73, 14–29. [CrossRef] [PubMed] [32] Caldarini, G.; Jaf, S.; McGarry, K. A Literature Survey of Recent Advances in Chatbots. Information 2022, 13, 41. https://doi.org/10.3390/info13010041 [33] Battineni, G.; Chintalapudi, N.; Amenta, F. AI Chatbot Design during an Epidemic like the Novel Coronavirus. Healthcare 2020, 8, 154. https://doi.org/10.3390/healthcare8020154 [34] Abbott Po Shun Chen, Chai Wu Liu. "Crafting ASR and Conversational Models for an Agriculture Chatbot" , 2021 The 4th International Conference on Computational Intelligence and Intelligent Systems, 2021 https://dl.acm.org/doi/abs/10.1145/3507623.3507634 [35] Abbott Po Shun Chen, Chai Wu Liu. "Crafting ASR and Conversational Models for an Agriculture Chatbot" , 2021 The 4th International Conference on Computational Intelligence and Intelligent Systems, 2021 https://dl.acm.org/doi/abs/ 10.1145/3507623.3507634
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 378-391 doi:10.4028/p-s8m133 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-13 Accepted: 2022-09-16 Online: 2023-02-27
An Applied Computational Linguistics Approach to Clustering Scientific Research Papers Amaan Vora1,a*, Mihika Mishra2,b and Steffina Muthukumar3,c 1-3
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Ramapuram part Vadapalani, Chennai,India
[email protected], b [email protected], [email protected]
a
Keywords: Natural Language Processing (NLP), Computational Linguistics, Arxiv, Data Analysis, Exploratory Data Analysis Topic Modelling, Clustering K-Means, BERT, S-BERT, BERTopic, SciBERT
Abstract. Scientific data available on the internet is rarely labelled. Most popular research paper repository sites contain papers without any annotation for grouping data. Classification of text via words, sentences and even paragraphs has become a key resource for a lot of industries looking to help their computers understand human language – the next stage in Artificial Intelligence. Using valuable Computational Linguistics ideas, some industrial applications have been able to streamline their processes to effectively and efficiently process and interpret language data. Continuing in this trend, in this paper, we aim to effectively clustering scientific research papers into topic-based differentiators, in the most efficient manner. Using multiple algorithms that have revolutionized the industry in the previous years, we compute over 800,000 entries of scientific research articles across 200+ domains that have been uploaded to accurately predict domains for each of these articles. We use clustering techniques like the K-Means algorithm to derive the topics for these papers with an accuracy of nearly 80%. We also use BERT to create topic clusters that generate topics based on frequently occurring contexts within the text. Beyond BERT, we use offspring algorithms that tackle specific, niche issues that BERT does not account for. We also fine- tune the parameters of the algorithms used to generate over 50 stronger topics that more accurately define scientific articles. 1.
Introduction
Computational Linguistics has been an essential tool to help computers understand human interaction. Ever since 1972 when the first wave of Data Science papers was published, there has been a growing interest within the domain of understanding all forms of human communication. The biggest hurdles faced within this approach were of semantics. Any communication derived from human interactions – be it verbal or physical can be attributed to its meaning based on context. Our understanding of context is based largely on our understanding of sentences, paragraphs and even essays as whole. This is because as humans, our brains function as maps – using past understanding, images, memories – to create a mental picture. Computers, even the most powerful that the 21st century has to offer, simply cannot compute at this level. Today, the biggest challenges within the fields of linguistics and Natural Language Processing has been to create context for textbased data. Context similarity is a technique of creating aggregate numerical values that would form clusters of similar vectorised text data to create domains. These domains or, to be more precise, clusters give us a pretty broad overview of the data we are using to help the computer understand human language. This is done using complex Machine Learning algorithms that have usually relied on distance measures to draw clusters on a graph. That all changed, with the introduction of transformers. Transformers use a self-attention strategy – by assigning importance to different parts of the data being used, they are able to produce significantly more precise results within lesser time. As a deep learning model, it differentially “weighs” the importance of each subset of this data, without paying heed to the order in which the data is presented. This mechanism was already a revolutionary take within NLP, but it was developed even further by the introduction of BERT.
Advances in Science and Technology Vol. 124
379
In 2018, Jacob Devlin and other researchers at Google AI developed a bi-directional transformer. This transformer was trained on data from Wikipedia repositories in particular to understand only one thing – context. In theory, this would take in input data through on transformer, compute the data in a vectorised format, and pass the values through the other transformer to give us text data as output. This allowed for order to be restored to the data, which meant much larger data could be passed into this technology without losing context. This sparked a new revolution within the industry, with many subsequent researchers dedicating countless hours towards perfecting smaller aspects of this algorithm. Today, we produce over 2.5 quintillion bytes of data. Most, if not all of this data can be understood by us through human language. Companies and entire industries around the world spend billions to help computers understand communication and improve their workflow. But these algorithms and technologies are not optimised for the data being used. Scientific documents are the hardest for computers to parse, owing to significant words within these papers not having any pre-established context or definition for the computer to use. Methods to understand scientific research have been incomplete at best, and lacklustre at worst. This research aims to tackle the problems faced by computers while developing broad domains for scientific research articles by testing the best algorithms and techniques available today to generate accurate, comprehensible results. 2.
Previous Works
The systems and technologies available today can process text in three formats – words, sentences and paragraphs. Word processing is tedious and does not factor in the relationships between words as part of larger text structures. Sentences would be ideal to process research titles and allocate a label to them. But more often than not, the titles present an incomplete overview of the contents of the paper. In which case we have to perform the added step of analysing the abstract of a research paper – to get a better understanding of all the topics that the paper covers. In order to process both sentences and paragraphs, we searched for algorithms that are specifically trained towards sentence embedding and paragraph encoding. So far, the best performative approaches have relied on transformer algorithms, specifically BERT (Bi-directional Encoding and Representation of Transformers)[1][3] – a revolutionary paper introduced by Google that looks at sentence-embedding instead of word vectorization and topic modelling to improve clusters. Its motive was to better refine the process of clustering by reducing the parameters and volumes of data to help make the prediction process more efficient. SBERT – Sentence-BERT[2], doubles down onthe idea presented by BERT[1][3], opting to add a third pooling layer to its approach, further cementing its priority towards prediction- based efficiency. Via these two papers, we were able to better understand where the limitations posed by the previous papers originated. Sentence Ordering – a key principle to efficiently embed large volumes of text was practically unheard of – until BERSON came along and changed the game. Its focus on paragraph coherence instead of individual sentence exploration made the exploration of text much easier. By evaluating text similarity based on corpus (key text-based data)[4], we concluded that computing keyness for topics, as opposed to words, and then examining the highest probability words for those key- topics, it might be possible to produce a similar summary of the differences between corpora with less manual intervention. All these technologies have a common key factor – they have been developed for niche, individual purposes. Using them for anything beyond their predefined purpose results inless than stellar conclusions. We aim to tackle that problem by targeting the source – the data we use for computation. 2.1 Features and Datasets
Throughout our references, we found a similar pattern – when it comes to clustering data to model topics, the data used contains limited features. BERT used Wikipedia entries to train its algorithm but fails to comprehend scientific corpora. BERTopic is the best algorithm available for mass
380
IoT, Cloud and Data Science
computations involving topic modelling, but it gives a superficial overview of key words that are present within our data. BERSON, while developed specifically to understand sentences using a specialised network fails to comprehend context across a paragraph. RoBERTa misses the scientific context while managing the complexities plaguing other algorithms. And while SciBERT is the only advanced algorithm that can truly compute scientific data and retain its context, it cannot handle large volumes of data. Therefore, our data has to be presented in different formats to different algorithms, while retaining the few key aspects like title, abstract, paper ID and category. We would also have to manage the size of our data, so that we can provide faster computations with the same levels of accuracy. 2.2 Unsupervised Algorithms The aforementioned algorithms have been pretrained on individual aspects of data that does not represent our data as a whole. Research in the past has tried to solve problems individually, and in order to change that, we have to compute beyond the pre- established norms of the algorithm. Changing the parameters and establishing different levels or modes of computation can help an unsupervised algorithm perform better. 2.3 Fine-tuning Approaches All algorithms used in prior research have been pretrained with their own sets of parameters. At the end of computation, these parameters become the default parameters these algorithms run on. Setting parameters in this fashion does not consider a broader stream of data, and cannot be applied to different datasets while achieving similar levels of accuracy. Table 2.1. Algorithms Used and their Purpose Algorithm K-Means Algorithm
Use Case
Advantages
Clustering
Creates ‘k’ clusters from ‘n’ values using the nearest mean for each cluster.
An algorithm with tried and tested results, this particular approach does not need any pretraining
Bidirectional Encoder and Representation from Transformers (BERT)
A transformer structure containing an encoder and a decoder. It is pretrained on millions of textual values.
As a newer algorithm with a defined focus, this pretrained algorithm can give us more accurate results, albeit increasing our complexity.
BERTopic
This particular model has been pretrained to perform a similar iterative approach to k-means.
A clustering algorithm atop a bidirectional transformer allows us to perform faster computation without the worry of pretrained data or the efficiency of the transformer.
Siamese- BERT
Has a triple network Structure that allows for greater computation with reduced efficiency.
It can handle copious amounts of vectorization and transformation without affecting complexity.
SciBERT
This is an algorithm pretrained on scientific corpora, incorporating techniques from BERTopic and BERT.
As a standalone algorithm, this has continually tested as the best performative algorithm in our research.
3.
Issues and Inferences
The current systems deal with individual aspects of semantic and contextual similarity clustering. While BERT is revolutionary in its bi-directional transformer approach, it is restricted by its unfamiliarity towards corpus texts. Similarly, while BERSON and SciBERT are developed
Advances in Science and Technology Vol. 124
381
specifically to deal with corpus data, they are not equipped to handle the sheer volume of data that BERT is trained on. Moreover, our biggest challenge comes towards finding and applying the right data. Corpus data is hard to come by, and even harder to vectorise. While research papers have often resorted to readily available clinical and medical data, they have traditionally stuck to results within a set domain, rather than exploring multiple fields. This has resulted in overall incomplete systems. These systems, while efficient in handling individual aspects of our tasks, crumble at the magnitude of the overall process. Mixing aspects of these research papers could theoretically hand us our solution. If we were to understand the nature of topic modelling towards corpus similarity, and were able to replicate the metrics towards sentence embeddings, we could consider large volumes of data within our dataset. While our methodology is already replete with multiple clustering and BERT-oriented algorithms, a competitive accuracy-laden approach could give us the very best results. Consequently, we would also be able to learn from the inferences drawn out by the algorithms to further improve our study. 4.
Objective
With this paper, we aim to test multiple systems with changes to their default parameters, optimised efficiency and an all- encompassing dataset. Over 800,000 articles of the Arxiv dataset were used to perform this experiment, across 5 different algorithms and 3 different approaches. With a level of proclivity towards detail and orientation, our primary objective with regards to this research preliminarily remains towards clustering scientific research based on varying topics and domains. This is achieved using multiple novel techniques like topic similarity, vector similarity and sentence relatedness. We also perform multiple algorithms to check for the best accuracy score without the risks of bloated time complexities or an overfit dataset. These algorithms are some of the best and most preferred algorithms towards the field of Natural Language Processing and Computational Linguistics. Finally, we intend to enhance each algorithm’s performance by boosting parameters that result in efficient and effective clustering of scientific research papers based on modelled domains. 5.
Data
For our research, we used the Arxiv library dataset. This dataset contains over 8,000,000 entries containing the title, abstract, authors, date of submission, topics and citation for each paper. The entries within this dataset span over 2 decades, and cover over 50 different domains. The initial dataset is available as a JSON file. Each paper within this documentation has a key attached to it, that describes the nature of the data present. For example, each entry has a separate Author, Title and Abstract tag that differentiates data within the same entry. In order to continue our approach in a speedier manner, we opted to convert this JSON data to a Pandas Dataframe.
382
IoT, Cloud and Data Science
Figure 5.1. Arxiv Dataset, JSON format
Within this entry, it is natural to assume that certain entries are more valuable than others. For example, entries like ID, Authors, Title, Abstract, Categories and Date give us all the information we need to process our algorithms. Using data beyond that increases our time complexity and yields an ultimately unfavorable result.
Figure 5.2. Arxiv Dataset, Data-frame format
6.
Proposed System
For our system, we plan to correct the errors and limitations we encountered with previous author’s works and hope to either eliminate the limitations or find a way to work with or around them. For example, our primary goal is to test a broad spectrum of systems that are used these days for clustering data. The reason for testing these many algorithms is to ensure we arrive at the highest score within the least time. We spend a lot of time developing the dataset for this very reason. With a dataset that has entries within the millions, time complexity can be a major issue towards the efficacy of the program. Hence, through our data analyses and filtration process, we get rid of outlying data and redundant entries that would bog down the program, instead focusing on a diverse dataset which encompasses a significant majority of the whole. Building further upon the aforementioned point, we go one step beyond our requirement and also check for corpus relation and relevance among words and sentences. We compute the importance or the “keyness” within particular entries to determine the importance of that entry within the larger program. Finally, our approach towards the algorithms differs from the norm as well. Instead of taking in the entire volume and processing it as a whole, our algorithms will take in the data in layers. While this would confuse the audience when it comes to our time- complexity-first approach, we found through our own research that providing large volumes of data to any algorithm results in that algorithm moving values with superficial computations. In order to make certain that each value is being treated by the algorithm in the same detail-oriented manner, we have opted to divide the data
Advances in Science and Technology Vol. 124
383
into different submissions. This data in question will first pass through a mechanism that will derive context similarity from the text at hand. It will individually process each input and give a label to that data. Each data will receive a label depending upon its value. Upon completing this segregation, we will do an added step of defining the labels based on input size – generating Boolean values for the presence of data within any particular label. This will enhance our computation through the algorithm, and end up providing a higher accuracy in a more efficient manner. 7.
Dataset Analysis
The data used from the Arxiv dataset spans over 20 years of research articles amassed across myriad domains and sub-domains. In order to understands and assimilate our data, we performed deep and exploratory analysis. This helped our research by doing the following – 1. Analysis provides a deeper understanding of the data at hand. This gives us a better view of the data we are dealing with when we eventually begin predictions using Machine Learning algorithms. 2. Analysing data provides us a differential understanding of the data that is important to our predictionsand the data that is unimportant. This allows for a data cleaning process that optimises our data further for the algorithms. We begin our analysis by performing a few key data interpretation operations. This gives us information about the shape, description of values and overall information regarding the dataset. Through that, we found the following information –
Figure 7.2 – Information regarding Abstract data
Figure 7.3 – Information regarding Title data Figure 7.1 – Dataset information
Figure 7.4 – Publications made per agency
Through this analysis, we find out that we have complete values present in each of the columns we use. The data at hand includes ID, author name, paper title, date of submission, category of the
384
IoT, Cloud and Data Science
paper submission and the abstract of the paper. Total dataset exceeds 40MB in computation and shape of the dataset shows 800,000 entries for our use. We now perform individual analyses on relevant columns to understand the data and search for duplicate values. We begin with abstract. Here we find that the abstract data has 3 duplicates. We will go ahead and remove the duplicates to improve our time complexity. We go ahead and perform similar computation for all columns. Now that we have interpreted the columns as individual values, we move to understanding the value they carry. In order to do that, we perform analysis based on inputs. This gives us information about the top publishing agencies that have received all the papers. We notice that there is a significant about of information that has no publishing attached to it. Over time, we discover that this information makes our data excess, so we filter data based on another parameter. With this, we are done with analysing text- based data using individual columns. Comparing multiple columns gives us even more insight into the data we are using for our algorithmic computation.
Figure 7.6 – Number of Articles submitted (by date)
Through this analysis, we are able to interpret there has been a significant increase in article submissions after 2007, with a significant spike in the same year.
Figure 7.7 – Daily/Hourly divide of Article Submissions (by number)
Here we are able to visualize the submissions made based on time of the day. We notice a trend of higher submissions happening around 1500 hours or 3:00 pm during the weekdays.
Figure 7.8 – Monthly divide of Article Submissions (by number)
This chart tells us the submissions made monthly. As an example, there have been over 5000 submissions over the past two decades within the month of July alone.
Advances in Science and Technology Vol. 124
385
Figure 7.9 – Number of Submissions by Category
These analyses use 3 separate parameters.The graph above tells us the number of submissions made overall per category. We can notice that while most categories remain near the average annual submission rates, there are a few outlier categories that have significantly higher submissions. These categories are useful to us for prediction.
Figure 7.10 – Number of Submissions by Authors
Our final analysis is also based on poly- parameter computation. Using this graph, we are able to determine the submissions made over the years by authors. We can see certain authors like Donald Schneider who made 12 submissions in the year 2012. These analyses are present to give us further insight into the data that is relevant for computation and the data that can be filtered. 8.
Model Structure
Figure 8.1 – Model Architecture Phase 1 Figure 8.2 – Model Architecture Phase 2
We divide our model into three distinct parts– procurement and analysis, prediction and modelling, and tuning. Procurement and analysis involve us preparing and understanding the data before we begin our next process. Prediction and modelling will have us developing the algorithms and tuning them as per the data to come up with either clusters or topic models for our data. The tuning process will involve the improvement of algorithms through our inferences to check whether we can improve our score while retaining our complexity.
386
9.
IoT, Cloud and Data Science
Algorithms
As aforementioned, we are using 5 broad algorithms that represent the unsupervised clustering domain as a whole, taking from the older established clustering techniques as well as the newer, more revolutionary computations.
Figure 9.1 – K-Means Clustering Algorithm Equation
K-Means algorithm is an iterative algorithm that tries to partition the dataset into K pre- defined distinct non-overlapping subgroups (clusters) where each data point belongsto only one group. It tries to make the intra- cluster data points as similar as possible while also keeping the clusters as different (far) as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster’s centroid is at the minimum. For our research, we decided to compete U- Map and t-SNE against each other to define topics for our data. The location of each article on the graphing plot was determined by U-Map / t-SNE while the labels were determined by k-means algorithm. In some cases, the labels generated via the algorithm are more spread out on the plot. This could be because certain contents in the paper are similar in nature, so it's hard to clearly separate them. This effect can be observed in the formation of smaller clusters on the plot. The algorithms may find connections that were unapparent to humans. This may highlight hidden shared information and advance further research.
Figure 9.2 – BERT Architecture
BERT’s key technical innovation is applying the bidirectional training of Transformer, a popular attention model, to language modelling. This directly opposes previous efforts which looked at a text sequence either from left to right or combined left-to-right and right-to-left training and shows us that a language model which is bidirectionally trained can have a deeper sense of language context and flow. For our research, we have opted to define our pretraining encoder using SciBERT’s uncase d archive. This contains definitions for scientific corpora that would be useful for computation. We use two dense layers and a learning r ate of 6x10-4 for the algorithm. We were able to complete our testing with BERT by genera ting 10 epochs using base parameters at an average loss rate of 0.039202813957485365.
Advances in Science and Technology Vol. 124
387
Figure 9.3 – BERTopic Architecture
BERTopic is a straight-forward addition to the BERT lineup that started the wave of next- generation computation within the field of Natural Language Processing and Computational Linguistics. Our first step using BERTopic is performed by leveraging Sentence Transformers, a multilingual framework providing an easy way to generate dense vector representations for each document within our data corpus. From the previous embeddings derived by the use of SentenceTransformers, we can then perform the clustering with the UMAP approach for dimensionality and the result is transmitted to HDBSCAN to cluster semantically similar clusters (sets of documents). This last step is performed by extracting the most relevant words for each cluster using the classbased TF-IDF approach in order to get the representation of topics. So far, through our computations, we arrived at the following conclusion - From a glance, the most frequent topics seem to have coherent and clear topic representations. Interpretation of these clusters is much easier if you are familiar with the content of the documents.
Figure 9.4 – Tf-Idf Vectorization Equation
For each topic, we have generated 10 words that best represent that topic. However, to understand the amount of words needed to have a sufficient topic representation, we have shown the decline of term score when adding terms. The idea here is that each term that is added has a lower term score than the previous since the first is the best term for the topic. Eventually, we reach the point of diminishing returns, which is very similar to the elbow- method used in k-NN. Using the elbow method, we concluded that 3 words per topic are sufficient in representing the topic well. Any words that we add after that have seemingly little effect.
388
IoT, Cloud and Data Science
Figure 9.5 – S-BERT Architecture
Siamese-BERT or S-BERT uses a triple network structure to perform computations. This structure allows for faster bi-directional computation. For this algorithm, we used the JSO N format of our dataset and extracted the title, abstract and category of our article. We the n proceeded to run the algorithm on a Relu optimizer, at a learning rate of 6x10-4 and used all 800,000 values. After our first computation, we ended up using the Adam optimizer, with an epoch count of 15, and ended up with a value loss of 0.08084711413746723. Although we already performed a computation on a BERT pretrained with the SciBERT library, we ran another computation using the SciBERT pretrained module and setting the optimization to Adam. While our initial BERT testing fared less than optimally due to a lack of scientific corpora, this change in fine-tuning drastically changed results, giving us a better prediction model as our final product. 10. Results Our initial tests ran with the K-Means clustering algorithm. We were able to successfully develop 20 topics to divide the data into, and we did this by using U-Map and t-SNE for plotting clusters on our graphs. These two plots show differences regarding the positions occupied by the papers and categories on the graph, suggesting there may be differences observed by the algorithm within the higher dimensional data. As we concluded earlier, this is the effect of articles having similar information or overlapping values that skews the results in this direction. Overall, we were able to predict broad topics but failed to secure any real difference when it came to sub-topics. When it came to BERT, BERTopic, RoBERTa / BERSON and SciBERT, we decided to change our data after our first round of observations. Instead of broadening our scope, we took only the title, abstract and category for each article, and reduced the number of observations to 20,000.This resulted in a few changes. It made our computations faster, enabled us to try different parameters and ultimately provided us similar accuracy within a much shorter timeframe.We were able to create categories based on key words and their context as part of a paragraph or statement. We were then able to derive multiple topics based on the clusters of words generated.
Advances in Science and Technology Vol. 124
389
Figure 10.1 – U-Map and t-SNE cluster maps
As we map these clusters onto a graph, we are able to observe a trend with our terms added. They lose score as more terms got added to each cluster, and so we optimised the algorithms to use only 34 words per cluster so as to avoid major loss in term score.
Figure 10.2 – Topics generated using BERT
Figure 10.3 – Plotting topic clusters on a Distance Map
We mapped out our topic clusters on a map. The spot marked in red shows its position on the plot. Notice how there are groups of clusters being formed on the graph. This denotes the formation of larger groups of clusters that contain topic clusters for ourdata. Our next step involved checking whether these clusters were able to accurately predict broad domains for our data. In order to accomplish this, we took two steps – 1. We matched our data with the topic cluster and checked for broad domains 2. We ran an algorithm to match the broad domain with the pre-established categories to check the accuracy of the clustering algorithms for smaller sub- domains. Ultimately, we were able to provide text in the form of title and abstract to these two approaches and gained favourable results.
390
IoT, Cloud and Data Science
Figure 10.4 – Predicting Topics for Articles using Title
Figure 10.5 – Predicting Topics for Articles using Abstract
11. Limitations Posed While our algorithms did perform better with scientific corpora than their pretraining warrants, there were some limitations we faced during this experiment. For starters, the SciBERT pretraining module cannot be used with every rendition of BERT due to its differing nature of computation. This posits certain problems for the computer to infer scientific articles with non-defined text. As our computations improve in the future, we do hope to clarify this problem by either using our own pretraining module or tweaking the SciBERT module so that it can be accessed by other algorithms. 12. Conclusion Through our research, we were able to conclusively achieve our initial goal. Testing these algorithms against our curated data already gave us strong results, but it was the fine-tuning and the pretraining that nudged our score even further. In the future, we would like to explore further ways of clustering entire documents, being theoretically able to perform such computations using these variations of the algorithms. Finally, even in the cases where the algorithms performed subpar, they were able to provide pretty accurate predictions for topic modelling. This exercise was to determine the broad domains for these articles, and they were able to conclusively do the same. We plan to explore alternative ways to try this concept in the future
Advances in Science and Technology Vol. 124
391
References [1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding – Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova Google AI Language [2] Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks – Nils Reimers and Iryna Gurevych Ubiquitous Knowledge Processing Lab (UKP-TUDA) Department of Computer Science, Technische Universitat Darmstadt [3] BERT-enhanced Relational Sentence Ordering Network – Baiyun Cui, Yingming Li, and Zhongfei Zhang College of Information Science and Electronic Engineering, Zhejiang University, China 2Computer Science Department, Binghamton University, Binghamton, NY, USA [4] Evaluating a Topic Modelling Approach to Measuring Corpus Similarity – Richard Fothergill, Paul Cook and Timothy Baldwin Department of Computing and Information Systems The University of Melbourne Victoria 3010, Australia Faculty of Computer Science University of New Brunswick Fredericton, NB E3B 5A3, Canada [5] Empirical Linguistic Study of Sentence Embeddings – Katarzyna Krasnowska-Kieras Alina Wróblewska Institute of Computer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01248 Warsaw,Poland [6] SCI-BERT: A Pretrained Language Model for Scientific Text – Iz Beltagy Kyle sLo Arman Cohan Allen Institute for Artificial Intelligence, Seattle, WA, USA [7] Analysis of sentence embedding models using prediction tasks in natural language processing – Y. Adi E. Kermany Y. Belinkov O. Lavi Y. Goldberg IBM J. RES. & DEV. VOL. 61 NO. 4/5 PAPER 3 JULY/SEPTEMBER 2017 [8] Siamese BERT Model with Adversarial Training for Relation Classification – Zhimin Lin, Dajiang Lei, Yuting Han, Guoyin Wang, Wei Deng, Yuan Huang Auckland University of Technology, 978- 1-7281-8156-1/20/$31.00 ©2020 IEEE DOI 10.1109/ICBK50248.2020.00049 17281-8156-1/20/$31.00 ©2020 IEEE DOI 10.1109/ICBK50248.2020.00049 [9] A Novel Sentence Embedding Based Topic Detection Method for Microblogs – Cong Wan, Shan Jiang, Ying Yuan, Cuirong Wang - Volume 8, 2020 Digital Object Identifier 10.1109/ACCESS.2020.3036043 [10] A Comprehensive Study of Text Classification Algorithm – Vikas K Vijayan, Bindu K.R, Latha Parameswaran Department of Computer Science and Engineering Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Amrita University,India. 978-1-50906367- 3/17/$31.00 ©2017 IEEE [11] An Analysis of Document Clustering Algorithms – V. Mary Amala Ba, Dr. D. Manimegalai 978- 1-4244-7770-8/10/$26.00 ©2010 IEEE
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 392-397 doi:10.4028/p-irai1l © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-20 Accepted: 2022-09-23 Online: 2023-02-27
Speech Accent Recognition Shantoash C.1,a, Vishal M2,b, Shruthi S.3,c and Bharathi Gopalsamy N.4,d* Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani Campus, Chennai, India [email protected], [email protected], [email protected], d [email protected]
a
Keywords: Human computer interface, Speech recognition, Convolutional Neural network, Mel frequency cepstral coefficients, Speech synthesis.
Abstract: The speech accent demonstrates that accents are systematic instead of merely mistaken speech. This project allows detecting the demographic and linguistic backgrounds of the speakers by comparing different speech outputs with the speech accent archive dataset to work out which variables are key predictors of every accent. Given a recording of a speaker speaking a known script of English words, this project predicts the speaker’s language. This project aims to classify various sorts of accents, specifically foreign accents, by the language of the speaker. This project revolves round the detection of backgrounds of each individual using their speeches. I. Introduction Human-machine interfaces are rapidly evolving. We are moving from the normal methods of input like keyboard and mouse to modern methods like gestures and voice. It is imperative to enhance voice recognition and response since there's a growing market of technologies, worldwide, that use this advancement. Accent comes from the Latin (accentus), which suggests "the intonation of singing." We use an accent for various sorts of emphasis in speech. Speech recognition is a process where the machine identifies words spoken by the user into text. A regional accent may be a particular way that folks from that place speak. The majority of English speakers in the world have accents that aren't exposed to speech recognition systems on a greater scale. To bridge the comprehension gap between these systems and therefore the users, the systems got to be tuned according to the accent of the user. Accent classification is a crucial feature which will be wont to increase the accuracy of comprehension of speech recognition systems. Since the dataset was large in terms of size (approximately 2GB) but the samples were less, we worked mainly on the Indian accents like Tamil, Malayalam, Hindi and Telugu. II. Literature Survey [1] Patel, I., Kulkarni, R., & Rao, Y. S. (2017). Automatic Non-native Dialect and Accent Voice Detection of South Indian English. Advances in Image and Video Processing Speech recognition has come a long way in recent years. However, recognition accuracy varies a lot depending on the speaker, especially if the speaker has a strong accent that isn't included in the training corpus. This research describes a technique for automatically recognizing English accents from five different South Indian states. The method is built on a set of random nets with HMM units that are context independent. [2] Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features Many speech systems, such as speech recognition, speaker identification, voice converter, and so on, benefit from automatic recognition of foreign accents. Because differences in accent are related to both prosodic and articulation characteristics, this research proposes a combination of long-term and short-term training. Each voice sample is divided into many equal-length speech chunks. Deep neural
Advances in Science and Technology Vol. 124
393
networks (DNNs) are trained on long-term statistical data in each segment, while recurrent neural networks (RNNs) are trained on short-term auditory features. [3] Automatic identification of the dialects of Assamese Language in the District of Nagaon Speech Recognition research is now being conducted in the modern world. The system will be properly trained to adapt to its pronunciation, acoustic, and language models if the dialect is identified prior to Automatic Speech Recognition. This paper discusses the suggested system, which employs MFCC for feature extraction. It has been found that the pattern of Assamese language pronunciations, vocabularies, and idioms differs to some extent in these places. [4] Language identification and accent variation detection in spoken language recordings A model for recognizing languages and accents in audio recordings is being developed. Shortdistance and long-distance features are combined in this Hierarchical-Sequential Nodule Model (HSNM). This model is used to evaluate the potential for easy knowledge transfer between the language identification (LID) and accent detection (AD) tasks. [5] Speaker vectors from Subspace Gaussian Mixture Model as complementary features for Language Identification We investigate novel high-level features for language detection in this study. With mean supervectors represented in a low-dimensional representative subspace, the recently introduced Subspace Gaussian Mixture Models (SGMM) provide an intuitive and fast solution for GMM acoustic modelling. Lowdimensional vectors are also used in SGMMs to provide an effective method of speaker customization. These vectors are used as features for language identification in our framework. They're pitted against our acoustic iVector system, whose architecture is now deemed cutting-edge for Language Identification and Speaker Verification. The NIST LRE2009 dataset contains the results of both systems and their fusion. [6] Regional accents recognition based on i-vectors approach: The case of the Algerian linguistic environment One of the most important research subjects in speech processing is the detection of regional accents. However, because developing such systems necessitates gathering and processing a vast quantity of training and assessment data for each target accent, it is a time-consuming and costly procedure. This paper offers preliminary results from an i-vectors technique to recognizing Algerian regional accents. The experiments were done out with data acquired from Algeria's east and central regions. The results reveal that employing the i-vectors approach to recognize both regional accents and examine the impact of evaluation data quality on the accuracy of the proposed recognition method. [7] An Automated Classification System Based on Regional Accent It's difficult to distinguish the native language from a speech segment of a second language utterance that shows a distinct pattern of articulatory or prosodic behavior. This study proposes a method for classifying speakers based on their regional English accent. Along with the native language speech data, a database of English speech, spoken by native speakers of three closely related Dravidian languages, was acquired from a non-overlapping set of speakers. The training set includes native speech samples from speakers of India's regional languages, including Kannada, Tamil, and Telugu. The GMM classifier achieved an identification accuracy of 87.9%, which was increased to 90.9% using the GMM-UBM technique. The i-vector-based strategy, on the other hand, had a higher accuracy of 93.9% and a lower EER of 6.1%. The results are encouraging, especially when compared to existing state-of-the-art accuracies of roughly 85%. When speaking English, it is shown that Kannada speakers have a higher nativity identification percentage (95.2%) than those who speak Tamil or Telugu as their home language.
394
IoT, Cloud and Data Science
[8] Speaker accent detection using support vector machine Accent categorization technologies have a direct impact on voice recognition performance. Accent detection is now carried out using two models: the Hidden Markov Model (HMM) and Artificial Neural Networks (ANN). Both models, though, have their own set of flaws. We employ Support Vector Machine (SVM) to recognize distinct speakers' accents in this thesis. SVM stands for support vector machine. It is a learning and classification approach. It has been frequently used over the previous 10 years and has proven to be a very reliable and powerful approach. The research presented in this thesis use SVM to detect various accents. SVM offers a considerable advantage over the classic classification methods used in accent classification, such as HMM and ANN, in that it can identify a minimal upper bound of the generalization risk. III.
Proposed Methodology
The main objective of the project is to identify the accent. Accents are a major part of our identity. An accent gives hints about who we are, and the community we belong to or wish to belong to. They play an important part for those getting to grasp new languages. Someone who was raised speaking English will sound different than someone who was raised speaking Tamil and learned English as a grown-up. Accents are a congenital part of spoken languages. It's important to realize that no accent is better than another. This project helps to judge the accuracy of the accent spoken by a person. IV.
Discussion
Mel Frequency Cepstral Coefficient Mel frequency cepstral coefficients (MFCC) was proposed for detecting monosyllabic words in continuously uttered phrases, but not for identifying the speaker. The MFCC computation is a simulation of the human hearing system that aims to artificially recreate the ear's working principle, assuming that the human ear is a trustworthy speaker recognizer. Frequency filters spaced linearly at low frequencies and logarithmically at high frequencies have been employed to keep the phonetically crucial aspects of the speech signal, using MFCC features founded in the observed difference of the human ear's critical bandwidths. Tones of various frequencies are frequent in speech communications; each tone has an actual frequency, f (Hz), and the subjective pitch is calculated using the Mel scale. The frequency spacing on the mel-frequency scale is linear below 1000 Hz and logarithmic above 1000 Hz. 1000 mels is the pitch of a 1 kHz tone at 40 dB over the perceptual audible threshold, which is used as a reference point. For improved robustness, some modifications to the basic MFCC algorithm have been proposed, such as raising the log-mel-amplitudes to a suitable power (about 2 or 3) before applying the DCT and decreasing the impact of low-energy sections. MFCC are cepstral coefficients that are obtained from a twisted frequency scale centered on human auditory perception. The initial step in MFCC computation is to window the speech signal in order to divide it into frames. Because the high frequency formants process has a lower amplitude than the low frequency formants, high frequencies are stressed to ensure that all formants have the same amplitude. After windowing, the power spectrum of each frame is determined using the Fast Fourier Transform (FFT). Following that, mel-scale filter bank processing is performed on the power spectrum. 2D Convolutional Neural Network If given a large number of signals, a convolutional neural network can learn a variety of basic signals such as frequency and amplitude changes, for example. Because they are multi neural networks, this information is transmitted to the first layer. Some recognisable features are passed into the second layer. A signal with a two-dimensional array of pixels is used to demonstrate this. It's a checkerboard where each square is either light or dark in colour. CNN determines if the signal has a frequency change or an amplitude change by observing the pattern. Because it is difficult for a computer to recognise the signal when the entire set of pixels is analysed, the convolutional neural network matches sections of the signal instead of evaluating the entire signal
Advances in Science and Technology Vol. 124
395
of pixels . Filtering is the arithmetic that goes into matching these. This is accomplished by considering the feature that is aligned with this patch signal, and then comparing and multiplying each pixel one by one, adding it up, and dividing it by the total number of pixels. This procedure is performed for each of the pixels in the image. The process of combining signals with a set of filters and features to produce a stack of filtered images is known as convolution layer. It's a layer because it works with a stack, which means that when one signal is convolutioned, it becomes a stack of filtered signals. Because the filters are present, we obtain a lot of filtered signals. One part is the convolution layer. Illustration in two dimensions, first step is to create a twodimensional matrix of fragments. The first layer's filters begin capturing two-dimensional information at the same time . In the next, higher level layers, the recorded values are combined. Depending on the level of activity, the CNN filter has varying network behaviour. CNN was able to learn simple patterns and then use different combinations to predict more effectively. V.
Implementation Analysis
i.Fetching of Data: The data is fetched by the process of csv generation, where a model named BeautifulSoup is implemented. This particular model helps to scrape the existing data and then parses the scraped version of the data. The Pandas model is mainly used for the csv generation. Once the web scrapping is done, a link is generated in the csv which in turn downloads as audio files using Pydub. ii.Splitting of Data The dataset which is used for our work is taken from speech accent archive which is a repository consisting of spoken English words spoken by over 2000 speakers of 100 native languages. The dataset split is done at a ratio of 80% training and 20% testing for running the pre-processing model. iii.Training Model Once the pre-processing of the dataset is done the Mp3 audio files are converted to .wav files which allows easy extraction of MFCC. This MFCC is then fed into 2d CNN model for prediction of native language class. The neural network built uses a sequential model. The CNN contains 2 Convolution 2D layers, 2 max-pooling layers, 2 drop out layer with one as 25% and other as 50%, a flatten layer, 2 dense layers with relu as the activation function for the first and SoftMax for the later.
iv.Save Model The progress of the model can be saved both during and after training. This means that a model can pick up where it left off and save lengthy training. Saving your model also allows you to share it and have others reproduce it. Most machine learning practitioners provide the following when publishing research models and techniques: • the code used to generate the model; and • the training weights, or parameters, for the model.
396
IoT, Cloud and Data Science
Sharing this information allows others to better understand how the model works and to test it with new data. After training, the model is saved. To use in the following phase, the model is saved as a.h5 file.
VI.
Results
This study made use of the MFCC feature from every sample to feed input to the 2D Neural Network and was able to obtain an accuracy of over 80%.
Advances in Science and Technology Vol. 124
VII.
397
Conclusion
Speech recognition is an incredible human ability, especially when you consider that a normal conversation involves the recognition of 10 to 15 phonemes every second. It should come as no surprise that developing machine (computer) identification systems has proven tough. Despite these issues, a growing number of systems are becoming available that have some success, usually by focusing on one or two specific areas of speech recognition. Speech synthesis systems, on the other hand, have been around for quite some time. These technologies are increasingly a common part of our lives, despite their limited capabilities and lack of the "natural" character of human speech. Using sequential MFCC characteristics, it is possible to classify accents based on their acoustic qualities using machine learning approaches. Speech recognition requires a pre-processing step called accent identification. This allows for better speech recognition. In this system, the best classifier in terms of accuracy was logistic regression. This work can be expanded to other accents in the future, and accuracy can be improved. References [1] [2] [3] [4] [5] [6]
Abbosovna, A. Z. (2020). Interactive games as a way to improve speech skills in foreign language lessons. Asian Journal of Multidimensional Research (AJMR), 9(6), 165–171 Cumani, S., & Laface, P. (2014). Large-scale training of pairwise support vector machines for speaker recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(11), 1590–1600. Dahmani, M., & Guerti, M. (2017). Vocal folds pathologies classifcation using Naïve Bayes networks. 6th International Conference on Systems and Control (ICSC) Bhanja, C. C., Laskar, M. A., & Laskar, R. H. (2019). A pre-classifcation-based language identifcation for Northeast Indian languages using prosody and spectral features. Circuits, Systems, and Signal Processing, 38(5), 2266–2296. Le, H., Oparin, I., Allauzen, A., Gauvain, J., & Yvon, F. (2013). Structured output layer neural network language models for speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 21(1), 197–206. Lee, S. (2015). Hybrid Naïve Bayes K-nearest neighbour method implementation on speech emotion recognition. In: IEEE advanced information technology, electronic and automation control conference (IAEAC), Chongqing, pp. 349–353.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 398-406 doi:10.4028/p-g9ekjp © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-10-12 Accepted: 2022-10-22 Online: 2023-02-27
Movie Recommendation System Using Machine Learning Mr. Arokiaraj P, Mr. Sandeep D.K., Mr. Vishnu J, Mr. Muthurasu N. Department of Computer Science SRM Institute of Science and Technology. Chennai, India [email protected], [email protected], [email protected], [email protected]
a
Keywords: NLP(Natural Language Processing), Python, Content-based filtering, Sentiment analysis, movie recommendation, web scraping, TMDB,Kaggle.
Abstract. There are more number of movies has been released the user gets confusion, which movie is suit for them and difficult to choose, so to become easier the recommendation system comes into play if the user search a movie it gives a accurate result with similar various suggestion . The suggestion movies are given by the recommendation system by the user search e.g., if the movie is action, love, crime, drama or by the director the similar movies will be suggested. The recommendation systems are used to recommend movies using the user previews choice. The Sentiment Analysis which helps to analyse the users sentiments, which is based on their choice.In recent year, the sentimental analysis became one of most major things for many of the recommendation systems Introduction The world contains a huge population so they need a recommendation for each and every thing, which they watch, listen to, etc.The recommend movies for the user, first collect the ratings for users and recommend the top rated of items to the target user.The content-based filtering recommendation is the authoritative approach that suggests related items based on their associated features. There are so many movie recommendation systems which can do this trouble free work for users but since there are a lot of movies and more users are online and also these both things are increasing day by day . And with the gradual increase in the social networking mania on the internet, there is a lot of information produced on the internet. The amount of data on the internet and also the flow of data on the web at a brisk speed makes it problematic for the user to manage the information with adequate tools. Content-Based Filtering: It is also known as cognitive filtering . This filtering method is used to recommend items to the user based on his past experience. For example, if the user likes only horror movies then the system recommends only horror movies similar to it which he has highly rated movies. The broader explanation could be to suppose the user likes only politics related content so the system suggests the websites, blogs or the news similar to that content.Its show the suggestion based on the single user profile. Sentiment analysis: Is commonly used to retrieve user reviews and reactions over a particular item or topic. The rating of products for online software services is depicted by use of sentiment analysis. The user’s sentiment reviews classify into positive, neutral, and negative classes. An automatic feedback technique is proposed on the basis of data collection from the internet. Natural language processing is used to process text data and text voting. To use this Multinomial Native Bayes algorithm is used which is usually the first solution for the emotional analysis task. Related Work Recommender systems can be of content-based approaches and collaborative based approaches . The collaborative filtering is used to recommend similarity between items and user.It will suggest movies for users with the similarity of another user .It is divided into model-based and memory-based methods. memory based method user user past data to recommend items and it does not have training phase it will be calculated by similarity of user to training user and the average of the similar ones to give better recommendations.The latest method uses correlation efficient and cosine similarity to give similar items for user and this approaches also trying to reduce cold start problem and data sparsity .The model based approaches recommend similar items with the help of user rating of the movie .This
Advances in Science and Technology Vol. 124
399
approaches are used by the big companies like Amazon prime,voot, sonyLIV. The collaborative filtering approach is used by amazon to give their customer better products for some short period of years .Netflix also uses collaborative filtering to try to improve the data sparse problem. It will recommend movies for user with similar taste rating movies. But the user may be giving higher ratings to movies. Content based is also known as cognitive filtering it uses the information and metadata to give accurate results. While some approaches use audio or video are analysed by image and signal processing technique(video clip, audioclip, movie posters) and some other approaches analyse text data(movietile,genre,director etc.,) by using Natural Language Processing techniques like tf-idf i.e words to vector. In this paper we use a content based approach. Literature References Fixed database using content-based technique and collaborative technique. Pradeep, N., Rao Mangalore, K. K., Rajpal, B., Prasad, N., & Shastri, R Advo, advocated After the study of recommending items from some fixed database has been done, by emerged which are content-based technique and collaborative technique these are the two main techniques which is used for recommending . In content-based recommendation, all items are recommended which are similar to the user provided, whereas in collaborative recommendation users whose watch similar are identified to those of the given user and recommends items they have seen. After the evolution of the recommender system new method has been invented which merges two or more techniques which called hybrid method. Before the recommendation system, people had to read reviews or choose the movie that suited their interest or had to randomly choose any movie based on some other criteria .This became more difficult as the number of movies that are available in online started increasing rapidly. Content-based recommendation algorithms using KNN approach (k-nearest neighbor) Yashar Deldjoo(B), Mehdi Elahi, Massimo Quadrana, and Paolo Cremonesi advocate There are various content-based recommendation algorithms. For example, classical KNN approach (k-nearest neighbor) computes the interest of a user for an unwatched item by comparing it against all the items seen by the user in the catalogue. Every each user watched item contributes to predict the interest score is proportional to its similarity with the unwatched item by the user this similarity is computed by means of a similarity function like cosine similarity or Pearson correlation over items VSM representation .By using other approach try to model the probability for the user to be interested to a target item by using Bayesian or exploits other techniques adapted from Information Retrieval for example the Relevance Feedback approach. Existing System Existing movie recommendation model is not fully capable of delivering the estimated movies as expected.Another major problem is the factors considered in the predictive model. The searched things can be described as actor names, directors, title, genres, year, certificate. These things lead to more accurate results from the model and successful movie predictions. The biggest problem they are facing is that they need a lot of data to make a successful prediction. A first perfect recommendation system requires data, which should be derived and analysed from the user data and then the magic algorithm using sentiment analysis does its work. Proposed Methodology Many site integration programs have been created and used for example Amazon prime, Google and Netflix. These systems use a combination of strategies such as collaborative, content-based, knowledge-based, user-based, and hybrid approaches etc. This paper uses content-based filters that
400
IoT, Cloud and Data Science
will provide accurate results compared with other filters, it shows the result based on the user reviews, and it suggests movies we have not seen at this point, but users like we have, and we like. The contentbased filters have proved effective, this filter does not involve other users but ourselves: It has no startup problem because it uses attributes, such as Actors name, directo,name,genre,characters, so our favourite movies are recommended immediately.By this algorithm will simply select the items what we like already to recommend shows the proposed system. Methodology Data Gathering: The source of the data set is fetched from the website of kaggle and TMDB .From kaggle the movie metadata dataset contains director name , actor name, movie_title, and genre. The credit.csv dataset contains crew and cast information of the movie. The dataset provides the following set of movie list names, cast characters,ratings, release date, movie title,director, genre, etc.Using the TMDB website the movies data are fetched using API key. Creating Environment: The environment is created by anaconda prompt .The anaconda prompt is used for installing packages, libraries, and modules in this system . The main advantage of using this prompt is that it allows users to modify according to their requirements. Data Pre-processing: After collecting the dataset some features in the data will be modified and dropped, and the required features will be selected in the data frame.We use features in dataset like actor_name,director_name,genre,movie_title to recommend movies by content-based filtering. Using pandas libraries in python, we modify the dataset to make a new data frame.Some features are sorted ,filtered for testing and training the data.
Fig.no.5.3 Scrape Information from wikipedia using python: The scrapping is used to derive data from the wikipedia, the data provide the information of individual casts of the movie and it displays their related movie. Content-based filtering using sentiment analysis : In this content-based filtering is used to determine the similarity between the items recommending movies similar to the movie user likes and gives the sentiments on the reviews given by the user for that movie.They put forward similar items based on a particular selected item. This method uses features in metadata dataset, such as description, director, genre, actors, etc. for movies, to give better recommendations . The Tfidfvectoizer to convert text into price and convert all available information into vectors. We use Natural Language processing(NLP) techniques for analysing sentiment. To do this Multinomial Naive Bayes algorithm is used for analysing user feedback positive or negative, and helps to suggest movies with positive feedback.
Advances in Science and Technology Vol. 124
401
Architecture Diagram
Fig no.6 Flowchart Diagram
Fig no.7 Result Analysis: Accuracy of Sentiment Analysis: Multinomial Naive Bayes is fitted for the classification with features. The Naive Bayes needs integer fractional counts. The Tf-idf could do the work by converting words into vectors.The observed accuracy of Sentimental analysis is 98.77% from the dataset provided. Thus gives the accurate result.
402
IoT, Cloud and Data Science
Fig no.8.1 Result of Movie Recommendation System:
Fig no. 8.2.1 Landing page: This is the landing page of our system and input in the search box is taken as a string of movies and string sent to the back-end for pre-processing from the database and results are shown and reverted back to client side using Ajax.
Advances in Science and Technology Vol. 124
403
Fig no.8.2.2 Search Result: The result generated by the search engine will be the movie name entered by the user. This is the result of our recommendation system .where the string gets processed at the back-end and shows the movie detail along with name, genre, top cast, release etc.
Fig no.8.2.3 Cast detail:This is the result of our system. This is the cast detail which the movie searched by the user, It shows the character and they name, who all participate in that movie.
404
IoT, Cloud and Data Science
Fig no.8.2.4 Scrap detail from wikipedia: In our system, this result shows the alternative character name and actual detail by clicking on their profile, and its shows the only character which is participated in the movie which is searched by the user.
Fig no.8.2.5 Recommended playlist: In our paper using sentiment analysis, the movies are recommended to the user by the type of director, same cast, gener, actor name, etc. By using Content-based filtering, the recommended movie was sorted out by highest rating achieved and this is the result of content-based filtering.
Advances in Science and Technology Vol. 124
405
Fig no. 8.2.6 User sentiment analysis part: This is the sentiment analysis part, where the users positive and negative review display by using sentiment analysis part. Conclusion In this paper our proposed system gives better recommendation on sentimental analysis based on metadata of cast, genre, director etc., We would like to improve movie suggestions by user preferences in future. The system accuracy will be 98% and the remaining 2% that some data is not there in Tmdb and kaggle, so it is difficult to get all the details of the movie. References [1] “Movie Recommendation System using Filtering,Approach,Parth Kotak” Department of Computer Engineering,Vidyalankar Institute of Technology,Mumbai,India, Prem Kotak,Department of Computer Engineering,Vidyalankar Institute of Technology,Mumbai, India (2021)” [2] “Movie Recommendation Systems using Sentiment Analysis and Cosine Similarity” Raghav Mehta and Shikha Gupta, , International Journal for Modern Trends in Science and Technology, Vol. 07, Issue 01, January 2021, pp.-16-22. [3] “Movie Recommendation System using Content based Filtering” Yeole Madhavi B, Rokade Monika D, Khatal Sunil S. Student, Department of Computer Engineering, SPCOE Dumbarwadi (Otur), Maharashtra, India 2 & 3 Assistant Professor, Department of Computer Engineering, SPCOE Dumbarwadi (Otur), Maharashtra, India (2021) [4] “Movie Recommendation System using Machine Learning Algorithms”, Akansh Surendran1, Aditya Kumar Yadav2, Aditya Kumar3, 1-3Student, Dept. of Computer Science Engineering, Raj Kumar Goel Institute of Technology, Uttar Pradesh, India (2020) [5] “Content based movie recommendation system”. International journal Pradeep, N., Rao Mangalore, K. K., Rajpal, B., Prasad, N., & Shastri, R. (2020). of research in industrial engineering, 9(4), 337-348.
406
IoT, Cloud and Data Science
[6] “Trends in content-based recommendation Preface to the special issue on Recommender systems based on rich item descriptions”, Pasquale Lops1 · Dietmar Jannach2 · Cataldo Musto1 · Toine Bogers3 ·Marijn Koolen4(2019) [7] “An Improved Approach for Movie, Recommendation System”, Shreya Agrawal (ME Student) CSE, SVITS, Indore, MP, India Pooja Jain (Assistant Professor), CSE, SVITS, Indore, MP, India (2017)” [8] “Movie Master: Hybrid Movie Recommendation”, Rajan Subramaniam and Roger Lee Tokuro Matsuo, Department of Computer Science Graduate School of Industrial Technology, Central Michigan University Advanced Institute of Industrial Technology (2017) [9] “An intelligent movie recommendation system through group-level sentiment analysis in microblogs,” Neurocomputing, vol. 210, pp. 164–173, 2016. H. Li, J. Cui, B. Shen, and J. Ma, “An intelligent movie 2021 IJARIIE-ISSN(O)-2395-4396. 14578 www.ijariie.com 2318 recommendation system through group level sentiment analysis in microblogs,” Neurocomputing, vol. 210, pp. 164– 173, (2016) [10] “Toward building a content-based video recommendation system based on low-level features”, Deldjioo, Y., Elahi, M., Quadrana, M., Cremonesi, P.: . In: E-Commerce and web technology. Springer (2015)”
CHAPTER 3: Machine Learning to Financial Data Analysis
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 409-417 doi:10.4028/p-46y2r2 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-04 Accepted: 2022-09-16 Online: 2023-02-27
Data Analysis and Price Prediction of Stock Market Using Machine Learning Regression Algorithms Gavin Abishek.L1,a*, Prasanna Venkatesh.P.K2,b, Vedha veena.C3,c, and Dr.A.Sinduja4,d Department of Computer Science and Engineering SRM Institute of Science and Technology Chennai, India 1,2,3,4
[email protected], [email protected], [email protected], [email protected]
a
Keywords: Stock Market, Price Prediction, Machine Learning, Regression, Linear Regression, Decision Tree, Support Vector Machine, Integrated Seasonal Integrated Season with EXogenous features, Gated Recurrent Unit.
Abstract: Stock analysis and forecasting is a very challenging study due to the unpredictable and volatile database environment. However, their patterns are often unique as they are influenced by many uncertainties, such as financial results of companies (Earnings per share), risk transactions, market sentiment, government policies, and conditions such as epidemics. Even though they are challenging our goal is to predict the accurate values within a shorter span of a dataset. In this paper we have compared and analyzed the best ML model that predicts the exact closing amount of the next few days, using three to four months of nifty50 Indian stock from Yahoo Finance. Five regression models are involved in this analysis, Linear Regression (LR), Decision Tree (DT), Support Vector Regression (SVR), SARIMAX (Integrated Seasonal Integrated Season with EXogenous features), Gated Recurrent Unit (GRU – deep learning). The performance metrics like RMSE (Root Mean Squared Error), MAE (Mean Absolute Error) and MAPE (Mean Absolute Percentage Error) are used. On the basis of our comparison, we would like to conclude that GRU provides a low error value in all three performance metrics and also gives accurate predictions compared to the other five regression models used. I.
Introduction:
Stocks are an equity investment that reflects our integrity in the company's tradings and assets. Prediction involves which stocks to buy/sell or analyzing which company’s stock returns a high profit, researching stock price trends, and also calculating turnovers, which finally ends up in predicting future stock price. The art of predicting stock amounts is a challenging task for researchers and analysts. For decades, a traditional forecasting method was a linear model. But studies have evaluated that a linear model is not suitable for stock price predictions. While, it has been proved that ML models show promising results during random assumptions, detecting nonlinear relationships, and also tackling noisy real-world data [1-2]. They aim at developing self-learning algorithms using past datasets, such that the machine can be enabled to project future activity. They contain different regression models, some of them are used to compare and analyze in this paper for predicting the future closing price. The stock market, ML, and Deep Learning are emerging combinations in this current era. Deep Learning methods are also capable of training highly non-linear, very complex datasets and making them comfortable for time series forecasting. One of the researchers explored that the multiple model combination shows wiser performance rather than their individual performance in the track of stock price forecasting [3]. To contradict this fact, in this paper we have undergone a series of analyses and comparisons among all five regression models, which are LR, DT, SVR, SARIMAX, and GRU based on scores gained in the performance metrics such as RMSE, MAE, and MAPE.
410
IoT, Cloud and Data Science
A. Related Works: Y. LIN, S. LIU, H. YANG, and H. WU [13] elucidated the prediction of stock trends using Candlestick Charting which in turn divides into thirteen patterns of a day, based on the eight-trigram feature engineering, which involves close, open, low and high index price of that day. And these featured data have been trained in six different regression models to predict the direction of the closing price. Only 60% accuracy in the prediction and realized that Logistic Regression must be updated with additional features. SVM and KNN fit only for particular patterns and SVM is suitable for smallscale price prediction alone. S. Mehtab, J. Sen and A. Dutta [4-5]. Proposed a gamut of ML and DL models for forecasting future opening price. Utilized four years dataset of NIFTY50 for recursive training and also used a multivariate regression model and LSTM to enhance the power of the model. Concluded that univariate models are best for prediction. LSTM which showed 344.57 as its RMSE value. N. Rouf [6] gone through all the stock prediction decade papers (2011-21), elucidated how and what models were used for prediction. This paper inspired us to do research on selected regression models and find the best fit among them. S. Ravikumar and P. Saraf [7] based on three researchers’ ideologies two models have been designed, one for prediction and the other to predict increases or decreases in closing price using the classifier. They have used LR, SVR, DT, and Random forest. They concluded Random Forest is the best model but it does not give a better result for all kinds of stocks. Mehtab [8] illustrated the superiority of price prediction of the CNN-based models compared with the LSTM-based models in the context of NSE. According to this paper, CNN leads its performance compared to LSTM, hence we used GRU (pretty similar to LSTM) for extrapolating closing price to surpass even more accuracy. Shrivastav & Kumar [9-10] emphasized an empirical study on stock amount prediction utilizing ARIMA and SVR models. The SVR model is superior compared with the ARIMA model. K.E. Hoque & H. Aljamaan [11] investigated conventional ML models like SVR & KRR using the Hyperparameter tuning method for forecasting the Saudi Stock Exchange. Hyperparameter tuning showcases a positive impact on SVR, GPR, and SGD and a negative impact on KRR, LASSO, and PLS. B. Motivation: (Until 2020)Just 2% of Indian household savings go into equities. But, most of the capital is stashed in form of bank deposits or in gold investments, or even real estate and they miss out on potentially high returns. One of the major reasons is the fear of the unpredictability of equities in Indian Society, most people relate it to gambling, but it’s not. Our intention is to eliminate the fear and educate and help people about the marvellous world of the stock market so that one won’t miss out on this wonderful investment strategy. This would be achieved when we are able to predict the ups and downs in the market values. Machine learning can be employed for forecasting future prices of Stocks/Shares by learning them using a dataset consisting of attributes that influence the prediction. The stacking algorithm is applied to various regressors in order to demonstrate the most accurate and promising precise results. Which ultimately enhances the prediction level and increases stock investment around the world. C. Outline of the Proposed System: The perspective of this paper is to collaborate on five different variants of the regression model and elucidate the performance of each model in order to depict the least error regressor in predicting the future closing stock price of the NIFTY50 index. Fig.1 explains the entire infrastructure of this analysis system. These are seamlessly explained below.
Advances in Science and Technology Vol. 124
411
Fig. 1. Architecture Diagram II.
METHODOLOGY:
This methodology surpasses the following sequence, Pre-Processing, Train–Test Split, and Ensemble / Model Building. Which would elaborate on the gamut of the process undergone during this research. Before that, we would like to walk through all the regression models involved in this system. ● Linear Regression: Linear Regression is a statistical approach utilized in predictive analysis. This depicts a linear relationship among a dependent (y) and at least one independent (x) variable. The formula used in this algorithm is: y= a0+a1x+ε (1) Where, y = Dependent variable in dataset, x = Independent variable in the dataset, a0 = intercept of the line, ε = random error. ● Decision Tree: Decision Trees are a type of Supervised ML, where the two nodes concept is used to continuously split the dataset according to a certain parameter. Decision node takes many decision using multiple branch factors, whereas Leaf nodes showcases output of those decisions and which do not proceed with branches. The decisions are performed on the basis of features of the given dataset. The formula used for this: Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature) Entropy(s) = -P(yes)log2 P(yes)- P(no) log2 P(no) (2) Where, S= Total number of samples, P(yes)= probability of yes, P(no)= probability of no. ● SVR: Support Vector Regression (SVR) resembles the SVM methodology for regression models alone. In the case of SVR consider the points which are within the decision boundary and the hyperplane (best-fit line) which is incorporated in that boundary based on the formula, which has a maximum number of points. Therefore wisely identify the boundary region, so that point within that boundary will have the least error. Which results in getting the best fit model. To satisfy hyperplane, it should satisfy: -a < Y- wx+b < +a (3)
412
IoT, Cloud and Data Science
● SARIMAX: SARIMAX(Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors) is the next generation of the ARIMA model. ARIMA includes an autoregressive integrated moving average, while SARIMAX includes seasonal effects and eXogenous factors with the autoregressive and moving average component in the model. Mathematically we can represent the model equation (4) is:
● GRU: In sequence modelling techniques, the Gated Recurrent Unit (GRU) is the newest entrant after RNN and LSTM, hence it offers an improvement over the other two. GRUs reflects Long Short Term Memory (LSTM), by using gates for controlling the flow of the information. The Interesting thing is GRU does not have a separate cell state and it only has a hidden state. GRU models are faster in training them due to their simpler architecture [12]. A. Pre-Processing: As Yfinance provides a clean dataset it was not that tedious to perform pre-processing. But still, there are a few mandatory steps needed to be performed in order to get an optimal model. Right after importing all the required libraries we have moved on with EDA (Exploratory Data Analysis) in which we have checked the presence of null values, plotted different graphs (Fig. 2), and implemented resampling (Fig. 3), rolling, filtering, and so on with the focus of tracing the trends on the closing price patterns (Fig. 4). We have also used a min-max scaler which scales and translates each and every feature individually and the ADF test (Augmented Dickey-Fuller test), this statistical test is commonly used to analyze the stationary of a series. In addition to it, we have also enhanced our dataset by removing outliers, converting the dataset into a NumPy array, and handling categorical data so that we can gain better accuracy while training, feature extraction, and engineering.
Fig. 2. Closing price vs Corresponding dates
Advances in Science and Technology Vol. 124
413
Fig. 3. Resampling
Fig. 4. Autocorrelation B. Train – Test Split: After all the pre-processing procedures, the cleaned data is split into train and test in order to train the model. The splitting is based on an 80:20 ratio, where 80% of the four-month dataset is taken for training while the remaining 20% is taken during testing the model. C. Ensemble / Model Building: This paper covers the implementation of five regression models, specific Linear Regression, Decision Tree, SVR, SARIMAX, and GRU. The NIFTY50 index values hold the following variables: date, high, open, low, and close of the index, and amount of stocks traded (volume) on its corresponding day. Decided to extrapolate future closing price based on date variable rather than collaborating trends of all other variables which may lead to the violation of the desired result. For this purpose, the variable close is taken as the response variable and the date variable as the predictor. Introducing the cleaned dataset into each regression model individually and enhancing the predicted result by utilizing technics called the GridSearchCV, which retains the best combination of the dataset. Finally, all the models are developed and these are now compared to determine the robust model. II.
Experimental Result: A. Dataset: The dataset used for this analysis is three to four months of Nifty50 from Indian stocks, which is time-series data and it is the best form to visualize and analyze stock markets. NIFTY50 is one of the most actively traded contracts around the world, governed by NSE Indices Limited. It is the leading stock exchange in India and the third-largest in equity shares around the world based on the World Federation of Exchanges (WFE) database. Nifty50 is derived from combining National and fifty as it consists of a diversified fifty stock index accounting for thirteen sectors of the economy. This dataset was collected from Yahoo Finance. It is a popular open-source library developed by Ran Aroussi and offers a Threaded and pythonic way to download the desired data. It has Accuracy and Security in its datasets. We used this API as it provides both historic and live datasets also it is easy and charges less to get data. Based on the previous reach papers Indian stocks are rarely used and most of their reaches were based
414
IoT, Cloud and Data Science
on foreign stocks that is the reason we preferred Nifty50 for our analysis. . In order to achieve this, we first imported the necessary libraries and then found Ticker for NIFTY50. A ticker is a unique stock symbol with a series of letters assigned to a particular stock for trading purposes. Using those we have downloaded four months of time series historic data for training purposes. Later previewed the information about the NIFTY50 statistics like average Daily Volume10Day, currency, gmtOffSetMilliseconds, messageBoardId, regular Market Price, short Name, symbol, etc. After this sent the data for pre-processing in order to make it suitable to train the model. B. Performance metrics: These five trained regression models are compared and tested through three metrics, namely, MAPE, MAE, and RMSE. ● RMSE: RMSE (Root mean square error) is the standard deviation of the errors when a prediction has been applied to a dataset. The errors are squared then the average is determined. This indicates that RMSE is more useful in case of large errors, which indeed drastically resembles in the model's efficiency. It is calculated as:
(5) Where N: no. of data points, y(i): i-th measurement, and y(i): corresponding prediction. ● MAE: MAE (Mean Absolute Error) is used when the performance is evaluated based on the continuous variable dataset. Which gives linear values and is also known as a scale-dependent accuracy measure. MAE = (1/n) * ∑|yi – xi|
(6)
Where Σ: summation, yi: Actual value for the ith observation, xi: Calculated value for the ith observation, n: Total number of observations. ● MAPE: Mean Absolute Percentage Error (MAPE) is one of the statistical methods used to determine the accuracy of any ML model which has been trained based on the desired dataset. It is also known as loss function. The absolute difference is calculated between the Forecast value (F) and the Actual Value (A). Further, apply the mean function on the result to depict the MAPE value. It is expressed in terms of percentage. (7) C. Performance Analysis: The best predictive model is determined based on the error value. The lesser the error better the performance, hence model with the least error value is considered as the best regressor.
Advances in Science and Technology Vol. 124
i.
415
Performance evaluation for regression models based on its error rates:
From the table. 1, all the five regression models have been compared and analyzed based on the error value derived from the three different metrics we have performed such as RMSE, MAE, and MAPE. This error value shown for all the five models are based on the 80:20 train-test data split up. The table has been arranged in descending order (i.e left most model has the highest error percentage and the rightmost model has the least error percentage). TABLE I.
Performance evaluation of regression models based on performance metrics LR
DT
SVR
SARIMAX
GRU
RMSE
640.06
348.62
283.73
202.97
190.03
MAE
591.31
302.55
227.35
157.87
150.09
MAPE
0.033
0.018
0.013
0.009
0.008
It’s clearly shown from TABLE. I that LR gives 640.06 RMSE, this shows that linear regression shows poor performance for time series data. DT gives 348.62 RMSE and SVR gives 283.73 RMSE, whereas SARIMAX gives 202.97 RMSE and GRU gives 190.03 RMSE. Therefore we can come to a conclusion that the GRU is the best predictive regression model among all five, as it gives a lesser error value for all three performance metrics. ii.
Performance evaluation based on output graph:
The evaluation of the LR, SVR, DT, SARIMAX and GRU performance are deployed in a prediction graph. This graph shows the comparison between the predicted closing price and the actual closing price for the test dataset. The X-axis displaces the date and Y-axis displaces the closing price of its corresponding date. The lines in each graph describe, the blue line represents the trained values of its corresponding model, the red line represents the original closing price and the yellow line represents the predicted price from its corresponding ML model using test data.
Fig. 5. Linear Regression prediction graph
Fig. 6. Decision Tree prediction graph
416
IoT, Cloud and Data Science
Fig. 7. SVR prediction graph
Fig. 8. SARIMAX prediction graph
Fig. 9. GRU prediction graph The Linear Regression (Fig.5) shows a linear graph for time series data, as stock trends may vary randomly and it doesn’t have a steady linear increase. In the case of SVR (Fig.7) and DTR (Fig.6), they are not that effective in prediction which shows a constant graph, while SARIMAX (Fig.8) predicted values are somewhat closer to the actual closing price but still predicted price varies with the actual price. Whereas in the Fig. 9, GRU gives a wiser prediction and anticipates an accurate closing price in comparison with all other remaining models as its predicted line coincides with the actual value line. III. CONCLUSION: This paper was an endeavour to ascertain the future prices of the stocks/shares with greater accuracy and reliability using emerging ML techniques. The presented work demonstrates a meticulous methodology for predicting the stock market and its prices from the perspective of the Indian economy, with the use of historical time-series data of multiple stocks to forecast the stock price. Nifty50 is used as the primary index. Five established and powerful Machine Learning models were used, which were tuned and tested with optimal configurations. They were scrupulously compared and analyzed, using multiple performance metrics. Since the error was minimal, it makes it highly optimal to forecast the actual stock price and reduce the uncertainty of future value. Results of the analysis proved that GRU performed the best in predicting the stock prices and recorded the least RMSE Value. (optional: But SVR required fewer resources and performed faster yet accurate enough, logging RMSE value comparable to that of GRU, providing a balance between runtime and error
Advances in Science and Technology Vol. 124
417
rate.) The results obtained were promising and met the conclusion that it is possible to prognosticate the prices of stocks accurately and efficiently using machine learning techniques. References [1]
K.E. HOQUE, and H. ALJAMAAN, “Impact of Hyperparameter Tuning on Machine Learning Models in Stock Price Forecasting”, IEE, vol. 9, 20/12/2021
[2]
A. Atla, R. Tada, V. Sheng, and N. Singireddy, ‘‘Sensitivity of different machine learning algorithms to noise,’’ J. Comput. Sci. Colleges, vol. 26, no. 5, pp. 96–103, 2011.
[3]
Ji, X., Wang, J. and Yan, Z. (2021), "A stock price prediction method based on deep learning technology", International Journal of Crowd Science, Vol. 5 No. 1, pp. 55-72.
[4]
Mehtab, S. and Sen, J., “A Robust Predictive Model for Stock Price Prediction Using Deep Learning and Natural Language Processing.”, In: Proceedings of the 7th Int. Conf. on Business Analytics and Intelligence, Bangalore, India, December 5 – 7 (2019).
[5]
S. Mehtab, J. Sen and A. Dutta, “Stock Price Prediction Using Machine Learning and LSTMBased Deep Learning Models”, TechRxiv powered by IEEE, 2021.
[6]
N. Rouf, M. B. Malik, T. Arif, S. Sharma, S. Singh, S. Aich, and H. Kim, “Stock Market Prediction Using Machine Learning Techniques: A Decade Survey on Methodologies, Recent Developments, and Future Directions”, Electronics, vol. 10, issue 21,8/11/2021.
[7]
S. Ravikumar and P. Saraf, “Prediction of Stock Prices using Machine Learning (Regression,Classification) Algorithms”, IEEE, 2020.
[8]
S. Mehtab, J. Sen, and S. Dasgupta, ‘‘Robust analysis of stock price time series using CNN and LSTM-based deep learning models,’’ in Proc. 4th Int. Conf. Electron., Commun. Aerosp. Technol. (ICECA), Nov. 2020, pp. 1481–1486.
[9]
B. M. Henrique, V. A. Sobreiro, and H. Kimura, ‘‘Stock price prediction using support vector regression on daily and up to the minute prices,’’ J. Finance Data Sci., vol. 4, no. 3, pp. 183– 201, 2018.
[10] L. K. Shrivastav and R. Kumar, ‘‘An empirical analysis of stock market price prediction using arima and SVM,’’ in Proc. 6th Int. Conf. Comput. Sustain. Global Develop. (INDIACom), 2019, pp. 173–178. [11] K.E. Hoque and H. Aijamaan, “Impact of Hyperparameter Tuning on Machine Learning Models in Stock Price Forecasting”, IEEE, vol. 9, pp. 163815- 163830, 2021. [12] K. Cho, D. Bahdanau, F. Bougares, H. Schwenk and Y. Bengio, “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation”, arXiv:1406.1078v3 [cs.CL], 3 Sep 2014. [13] Y. Lin, S. Liu, H. Yang and H. Wu, "Stock Trend Prediction Using Candlestick Charting and Ensemble Machine Learning Techniques with a Novelty Feature Engineering Scheme," in IEEE Access, vol. 9, pp. 101433-101446, 2021, doi: 10.1109/ACCESS.2021.3096825.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 418-425 doi:10.4028/p-sp20ub © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-07 Accepted: 2022-09-16 Online: 2023-02-27
Financial Time Series Analysis & Forecasting with Statistical Inference & Machine Learning Sarvesh Vishnu1,a, Dr. M. Uma2,b* SRM Nagar, Kattankulathur, Chengalpattu District, Tamil Nadu – 603 203, India
1
[email protected], [email protected]
a
Keywords: Financial Time Series, Indian Stock Market, Statistical Inference, ARIMA, Stacked – LSTM.
Abstract — Time series data and its practical applications lie across diverse domains: Finance, Medicine, Environment, Education and more. Comprehensive analysis and optimized forecasting can help us understand the nature of the data and better prepare us for the future. Financial Time series data has been a heavily researched subject in the present and in the previous decades. Statistics, Machine Learning (ML) & Deep Learning (DL) models have been implemented to forecast the stock market and make data informed decisions. However, these methods have not been thoroughly explored, analysed in context of the Indian Stock Market. In this paper we attempt to implement evaluate the avant-garde statistical, machine learning methods for Financial Time Series Analysis & Forecasting on Indian Stock Market Data. 1. Introduction Stock Market Prediction with Machine Learning has been a widespread phenomenon in the past decade. The advent of Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs) and their impressive performance have invited great academic attention and research in financial time series analysis with Deep Learning approaches.[1] However, its practical applications and approaches carry challenges quite different from its theorical counterparts. Overfitting, Interpretability and underlying assumptions of the experiments are some of the concerns associated with generic approaches. A lot of the research work in this field has revolved around western markets and stocks. Allowing for vast research opportunities in the interdisciplinary field of computing and finances in the context of Indian Markets.[2] This paper attempts to solidify financial time series analysis by addressing the above-mentioned issues while supporting them with sound statistical analysis. The experiments in the paper are carried in the context of Asia’s oldest stock exchange — Bombay Stock Exchange (BSE) to address the aforementioned causes and establish domain specific outcomes. 2. Related Work RNNs, LSTMs, Encoders, Attention models, Multi-modal techniques are some of the recent approaches several researchers have adopted towards financial analysis and prediction. The profoundly used Auto-Regressive Integrated Moving Average (ARIMA) model along with analysis of time series data, pattern recognition, have proven to be helpful in time series prediction & forecasting. [3] Comparison studies evaluating ARIMA, LSTM, Bi- directional Long Short-Term Memory (BiLSTM) on the basis of performance and accuracy report the Bi-LSTM model to perform better than other methods. [4] However, lookahead issues, overfitting and time taken by the model prevent it from being optimal for practical stock market prediction, time series forecasting. It is crucial that in the case of applying ML, DL and other models to financial time series analysis, interpretability plays a significant role. The better the understanding of the working of the model is, the better generalized the approach will be. Interpretable and compliant forecasting methods using multimodality graph neural networks (GNN) for financial time series, forecasting, learning cover a
Advances in Science and Technology Vol. 124
419
comprehensive set of parameters.[5] Forecasting of multivariate time series data; Non-linear independencies between timesteps and series complicate this task. Previous approaches have used attention mechanisms to select relevant time series, using its frequency information for multivariate forecasting in the fields of Solar Energy, Traffic, Electricity, Foreign Currency Exchange Rate. [6] Furthermore, beyond prediction of stock prices, returns and forecasting; Analysis of the volatility of stocks have proven to be quite useful in making better data driven financial decisions. [7] Gold price volatilities have a significant impact on many financial activities of the world. Previous work has developed a reliable prediction model by exploiting the ability of CNN to extract useful knowledge and learn the internal representation of time-series data as well as the effectiveness of LSTM, Encoder, Decoder algorithms across different domains. Finally, approaches have been successful in incorporating information into predictions by applying Hybrid Deep Learning and Neural Network components.[8] Competitions such as the M4 forecasting challenge have brought dynamic computational graph neural network system that enables a standard exponential smoothing model to be mixed with advance long short term memory networks into a common framework.[9] Through our experiment we will perform a thorough analysis of various statistical models such as ARIMA, SARIMAX as well as DL methods such as LSTMs and study the time series analysis and forecasting of different approaches. 3. Data and Methodology S & P BSE Index: With over five thousand companies listed under the index and recognized as a stock exchange in 1957. BSE is one of Asia’s oldest stock exchange markets. The data used in the experiment are obtained from ‘https://www.bseindia.com/’ & ‘https://finance.yahoo.com/’ . The data of the Top 30 stock indexes comprise of the Open, High, Low, Close, Adj Close and Volume.
Fig 1. BSE^SN Data The data has a total of 1230 rows and 6 columns as mentioned above, for the period 2017-04-24 to 2022-04-22. The pandemic trends and the recovery phase are also included in the above give time period. Essential to any machine learning pipeline, Data pre-processing and exploratory data analysis are the first steps of the project. The non-stationary data has to be scaled, processed in order to be used for all the algorithms. We also check the data for any seasonality that we can leverage to produce better results. We use different methods for smoothing data to aid with data preparation, feature generation and to visualize the patterns in the data. Followed by thorough data analysis & preparation, we use the Auto Regressive Integrated Moving Average (ARIMA), Seasonal AutoRegressive Integrated Moving Average with Exogenous Factors) SARIMAX, Stacked Long Short-
420
IoT, Cloud and Data Science
term Memory (LSTM) models for prediction and forecasting. We will further use the correlograms, statistical information for hyperparameter tuning in order to produce an optimized solution.
Fig 2. System Architecture. 4. Algorithms A. Auto-Regressive Integrated Moving Average. (ARIMA) & Seasonal Auto-Regressive Integrated Moving Average with Exogenous Factors (SARIMAX) The ARIMA model builds on two main ideas: The value of the time series today is dependent on the previous value of the time series and that present random error and the previous random error are factors which can accurately predict the present value of the time series. Below given are the formulations for an AR (1) process and an ARMA (1,1) process. During this experiment we also use the auto-correlation functions and the partial auto-correlation functions in order to choose the appropriate hyperparameters. At = ΦAt-1 + εt
(1)
Ft = ß0 + ß1Ft-1 + Φ1εt-1 + εt
(2)
Eq.1 represents an AR (1) process where the present value of the time series At is given as a function of the previous value of the time series At-1 and the random error εt. The 2nd equation, Eq.2, represents a basic ARMA (1,1) model where Ft is the present value of the time series, Ft-1 is the previous value of the time series, εt, εt-1 are random errors & ß0, ß1 are the respective co-efficient. An upgrade from the ARIMA model to take the time series’ seasonal nature and other explanatory variables into consideration is called as the SARIMAX model. This is achieved by taking the antilog counterparts of the ARIMA model along with seasonality measures and other factors. Θ(L)p ▲d t = Φ(L)q ▲d εt + Σ ßixit Θ(L)p θ(Ls)P ▲d ▲sD yt = Φ(L)q φ(Ls)Q ▲d ▲sD εt + Σ ßixit
(3) (4)
The above given equations Eq.3,4 represent an ARIMAX(p,d,q) & a SARIMAX(p,d,q)(P,D,Q,s) model. B. Long Short-Term Memory (LSTM) LSTMs are known to capture long term dependencies in data in order to leverage sequential information in time series analysis. However, one of the common errors while implementing these models is that the LSTM models are often not sequential in nature and are singular. In our experiment we use stacked LSTMs in order to best leverage the sequential information present in our data. Below given are the formulation of the model along with the summary of the model.
Advances in Science and Technology Vol. 124
421
ft = σ (Wf [ht-1, xt] + bf)
(5)
Ct = ft * Ct-1 + it * Ct^
(6)
it = σ (Wi [ht-1, xt] + bi)
(7)
Ct^ = tanh (Wc [ht-1, xt] + bc)
(8)
ot = σ (W0 [ht-1, xt] + bo)
(9)
ht = σt * tanh (Ct)
(10)
The above given formulations comprise the cell state equations, and the three gates of the LSTMs: Input gate Eq. 5, forget gate Eq. 6, output gate Eq. 7, Eq. 8 respectively, biases Given below is the configuration of the model used in the experiment. C. Generalized Auto-Regressive Conditional Heteroskedasticity (GARCH) As an additional part of our analysis, we also use the GARCH model to forecast the volatility of the stock exchange. This model considers the residuals of the best fit model and analyses the information to make forecasts. Ft = εt (α + αt-12 + ßσt-12) = εtσt
(11)
5. Empirical Analysis Throughout the experiment we use descriptive statistics and data visualizations to provide a better understanding of the temporal patterns in the data.
Fig 3. BSE Close Line Chart Rolling window functions are used to visualize the time periods affected by the pandemic and the recovery period observed in the temporal data.
422
IoT, Cloud and Data Science
Fig 4. BSE Trends Amidst the Pandemic
Fig 5. BSE Trends during the Recovery Period. As a part of Exploratory Data Analysis and Data Pre-processing we use Statistical methods such as simple moving average (SMA), exponential moving average (EMA), exponentially weighted moving average (EWMA) fot smoothing and for data visualization.
Fig 6. Simple Moving Average
Fig 7. Exponential Moving Average
Advances in Science and Technology Vol. 124
423
6. Results and Analysis Through research and experimental testing, we evaluated both statistical and deep learning models: ARIMA, SARIMAX, Stacked LSTM & GARCH to forecast and analyse financial time series. The main result parameter used is the mean squared error (MSE) to estimate the performance. Below given are the outputs for the respective models and visualization for the forecasts.
Fig 8. ARIMA Results
Fig 9. SARMIAX Results The Stacked LSTM Model performed the best with the least mean squared error. Both the ARIMA, LSTM Models were subsequently used for future forecasts and additionally the GARCH model was used to forecast the volatility.
Fig 10. Stacked LSTM Future Forecasts for 30 days.
424
IoT, Cloud and Data Science
Fig 11. Optimized ARIMA Future Forecasts
Fig 12. Volatility Prediction
Fig 13. Volatility Forecasts 7. Conclusion Time Series Analysis and Forecasting is a domain specific field where different models are required to produce optimal solutions for different data and address specific use-cases. This paper analyses the S&P BSE stock data of the Indian Stock Market using both statistical methods and deep learning methods while taking into consideration interpretability and lookahead errors. The results of the experiment suggest that stacked LSTM performs better with appropriate temporal data preparation than the profoundly used statistical models such as ARIMA, SARIMAX. References [1] Rao, P.S., Srinivas, K. and Mohan, A.K., 2020. A survey on stock market prediction using machine learning techniques. In ICDSMLA 2019 (pp. 923-931). Springer, Singapore. [2] Challa, M.L., Malepati, V. & Kolusu, S.N.R. S&P BSE Sensex and S&P BSE IT return forecasting using ARIMA. Financ Innov 6, 47 (2020) [3] A Prediction Approach for Stock Market Validation Based on Time Series Data | IEEE Access | Sheikh Mohammad Idrees, M.Afshar Alam, Parul Agarwal
Advances in Science and Technology Vol. 124
425
[4] The Performance of LSTM and BiLSTM Forecasting Time Series | IEEE International Conference of Big Data | Sima Siami-Namini, Neda Tavakoli, Akbar Siami Narmin [5] Financial time series forecasting with multi-modality graph neural network. | Elsevier | Daweu Cheng, Fangzhou Yang, Sheng Xiang, Jin Liu [6] Temporal Pattern Attention for multivariate time series forecasting | Springer | Shun-Yao Shih, FanKeng Sun, Hung-yi Lee [7] A CNN LSTM model for gold price time series forecasting | Springer | Ionnis E.Liveria, Emmanuel Pintelas, Panagiotis Pintelas [8] Time Series Forecasting with Deep Learning: A Survey | The Royal Society | Byran Lim, Stefan Zohren [9] A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting | Elsevier | Slawek Syml [10] Practical Time Series Analysis by Aileen Nielsen (O’Reilly). Copyright 2020 Aileen Nielsen, 978-1-492-04165-8. [11] Banerjee, D., 2014, January. Forecasting of Indian stock market using time-series ARIMA model. In 2014 2nd international conference on business and information management (ICBIM) (pp. 131-135). IEEE. [12] Merh, N., Saxena, V.P. and Pardasani, K.R., 2010. A comparison between hybrid approaches of ANN and ARIMA for Indian stock trend forecasting. Business Intelligence Journal, 3(2), pp.2343. [13] Jothimani, D. and Yadav, S.S., 2019. Stock trading decisions using ensemble-based forecasting models: a study of the Indian stock market. Journal of Banking and Financial Technology, 3(2), pp.113-129. [14] Jothimani, D. and Yadav, S.S., 2019. Stock trading decisions using ensemble-based forecasting models: a study of the Indian stock market. Journal of Banking and Financial Technology, 3(2), pp.113-129. [15] Chaudhuri, T.D. and Ghosh, I., 2016. Forecasting volatility in Indian stock market using artificial neural network with multiple inputs and outputs. arXiv preprint arXiv:1604.05008.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 426-432 doi:10.4028/p-nyoo2h © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-14 Accepted: 2022-09-16 Online: 2023-02-27
Stock Market Portfolio Prediction Using Machine Learning Srinjay Venkatesan1,a*, Sanjay Kumar S. 2,b and Geo Shaji Kutty3,c Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India
1
[email protected], [email protected], [email protected]
a
Keywords: Stock market, LSTM, RNN, ML, High Price, Close Price, Open Price
Abstract. A Stock refers to a share of ownership in a particular company. An investor owns 1 percent of the company if they purchase one part of its ownership divided into 100 parts, each equal to one share. Stock exchanges are run by an automated matching system driven by order demand. Stock prices are defined as “at any particular time, how many buyers and sellers available for the same stock in the market. If the number of buyers is more than sellers, then stock price becomes high” and viceversa. The stock market depends on many factors like open price, close price, high and low price. Many researchers have tried to predict the stock prices using various ML (machine learning) techniques such as ARIMA model, linear regression, RNN, etc. Because of the uncertainty in stock market, simple models cannot yield any genuine results. The limitations with models like ARIMA [1], TSLM [2] have been traced in this paper. This paper builds a web application using a library named Streamlit and integrate the stock prediction model. The main objective of this paper is to build a LSTM (Long-Short Term Memory) based RNN (Recurrent Neural Network) model using opening prices in order to get one of the most accurate stock rates. Introduction Financial markets function as a venue where stocks, derivatives, bonds, and foreign exchange can be sold or bought. Stock markets are highly affected by many factors, due to which there is always a high volatility and uncertainty in the market. It is possible for humans to comprehend, implement, and submit orders to the market. But there are Automated Trading Systems (ATS) operated by computers which perform better. To develop an ATS, many factors have to be considered, such as trading strategies, mathematical functions, to reflect a stock’s state, ML algorithms to predict stock prices, as well as specific news of a particular company’s stock value. In stock market predictions, prices of stocks are predicted considering previous histories. If individuals are able to obtain future predictions, they can make wise investments in the concerned entities. As predictions still entail a degree of risk, they may be either fifty percent right or fifty percent wrong. Several stock exchange markets with multiple companies in it are available at the national level like the NSE, BSE, as well as international level like New York Stock Exchange (NYSE), London Stock Exchange, etc. Time-series prediction is highly used in real-world applications like financial market prediction and weather forecasting. Numerous time series prediction algorithms have demonstrated high effectiveness quotient. Common algorithms used are Recurrent Neural Networks (RNN), also its subset, which includes Long-short Term Memory and Gated Recurrent Unit. Existing System The goal behind technical analysis is to identify the constantly varying decisions of investors in response to factors on the external side make stock prices move. There are numerous technical factors which involve quantitative parameters for analysis. Some of them are the low and high daily values, trend indicators, daily ups, daily downs, volume of stock, etc. [2] Using ARIMA. It does data pre-processing and indexing with time data series. The model creates a time series forecast and gives a prediction. After that, it validates the forecasts and at last, a visualization of the model is given. The prediction was made not very close to the actual stock price. It had errors in the predictions. As a result of the high mean-squared error and root mean-squared
Advances in Science and Technology Vol. 124
427
error, it was impossible to predict stock prices with accuracy. The ARIMA model is only suitable for short-term predictions. [3] Using ANN. It normalizes the dataset and removes the null values. It creates a sequential model and trains the model. It also checks for error in the prediction and compares the prediction with the real values. The results were better than the ARIMA model but still takes more time to do the prediction. Even though the error is less in this model compared to the previous model, the model fits the data too well which disrupts the predictions. Moreover, the training of data takes a longer period. [4] Proposed System Objective. It has never been easy to understand and invest in a set of stocks, as the volatile and fluctuating nature of financial markets do not allow simple learning models to predict future stock prices with higher accuracy. Machine learning is a dominant trend in scientific research. This project aims to build a model using RNN (Recurrent Neural Network), especially LSTM (Long-Short Term Memory) model to predict future stock market values. We propose a stock price prediction system using the important concepts of machine and deep learning and stock market concepts to achieve better stock prediction accuracy and suggesting profitable trades by looking into the share-holding portfolio of a particular distinguished investor who has played in the market for a long time and has an established track record of success. That investor will be the key to analysing and predicting stock prices of the company which that investor holds and we can predict the stock price of those companies to have a profitable venture. Novelty. RNN model is a neural network that can be assumed as multiple copies of same network each passing its result into successor. RNN cannot connect the long-term information and hence reducing the probability of successful prediction. [2] LSTM can overcome that problem as it remembers information for long periods. It has same chainlike structure but different module structure. There are 4 neural network layers, called the cell state, input gate, forget layer and output layer. The complexity to update each weight is O (number of weights) per step while complexity for RNN is O (number of heights x number of weights) per step. As shown in Fig. 1., the input source is Yahoo Finance which is taken using yfinance library. It is being pre-processed and normalised by converting the values between 0&1 with the help of MinMaxScaler. Then the data is being split into training set and testing set. The training set is given to a LSTM model and it is been run in loops. After termination the final output is given as input to another LSTM model along with testing data, decoded and then the prediction is being done. This model can predict the stock values of 30 days in future.
428
IoT, Cloud and Data Science
Fig 1: Flow of system Methodology LSTM. The hₜ and cₜ represent the current cell output and the current cell state respectively. xₜ represents the current multivariate input time series as shown in Fig. 2. There are three gates which control the update, forget and output characteristics of the LSTM model. These gates are paired with the four activation functions, each of which is either a sigmoid or hyperbolic tangent functions. The cell state cₜ is the variable that connects information from all cells/timesteps, which is the most essential tool of LSTM to preserve long-range time dependencies.
Fig 2: LSTM Network LSTM with slicing window technique. We have used a training scheme of LSTM known as the sliding window technique. In this, the input variable is a sequence of time-series data of length n and n+1 is the window size, namely the output result. [5]
Advances in Science and Technology Vol. 124
429
Fig 3: LSTM - Pictorial representation
Fig 4: LSTM Slicing technique ARIMA Model. ARIMA model was first used by Box and Jenkins in 1970. An ARIMA model with time series data is identified, estimated and diagnosed using a set of activities. The model has been one of the most important financial forecasting methods. In short-term forecasting, ARIMA model is good. The future value of a variable which is part of ARIMA model is known to be a linear combination consisting of the past errors and values. The model has a smaller standard error of regression. However, this model is better suited for shorter-term predictions. [3] Time Series Linear Model. A linear time series model (TSLM) is a very powerful and dynamic way to make a predictive model. With respect to a linear time series model, an ideal-case linear model is first created. Then the data is incorporated into it, and the linear model shows the properties of actual data. An advantage of the linear model time series is that the actual data are included into the ideal linear model. A linear model can be created by using the function, which in R programming language is called tslm() and incorporates StlStock data. The value h describes the months predicted/to-bepredicted. tslm() performs the pre-calculations required for prediction used for input for the prediction function. [6] Recurrent Neural Network. For learning, Recurrent Neural Networks (RNN) rely on back propagation and feedback is built on each node represented in Fig.5. Hence, RNN models can predict stock prices based on recent history and recurrence as well. They use inputs from the previous time
430
IoT, Cloud and Data Science
points to the input layer. RNN simplifies the process of feeding those words into the model through a smaller number of input nodes.
Fig 5: RNN model- Pictorial representation Architecture Diagram The data is web scraped from Yahoo Finance which is used as the dataset. MinMaxScaler is used to reduce the values between 0 and 1. Now the data is being converted into a data matrix and using fit_transform the array dimensions are converted to n rows and 1 column. The model then splits the data into train data (70%) and test data (30%). Then using the training data, an LSTM model is built and trained. Another LSTM model is built using test data and the result from the LSTM model was made before and this produces the required prediction.
Fig 6: Architecture Diagram Experimental Findings The data was taken from Yahoo finance. This raw data was used and data pre-processing was done on the dataset. After analysing various models available for stock prediction, it was found that LSTM based RNN models were giving performances better than all other models. LSTM model was capable
Advances in Science and Technology Vol. 124
431
of learning from time series data. We built a univariate LSTM model since it was found that univariate model was more accurate and time efficient than the multivariate model. This model runs efficiently and gives one of the most accurate results for 30 days in future, as shown in Fig.7.
Fig 7: Predicted value after 30 days
Fig 8: Plotting of train and test predictions Conclusion and Future Scope Thus, better stock prediction accuracy was achieved with the help of the above-used machine learning models. Future scope for the project can be made by adding multiple parameters related to financial ratios like P/E ratio, EBIDTA, etc., thereby increasing the accuracy. We can also plan to combine ideas of neural network algorithms like neuro-fuzzy logics to prove certain short-comings in the prediction. Also, news sentiment analysis can be taken into account, similarly public opinion on company issues too can play a major hand in affecting stock prices, therefore that can also be considered as a factor to determine future prediction.
432
IoT, Cloud and Data Science
References [1] A. A. A. Ayodele Ariyo Adebiyi, Stock price prediction using the ARIMA model, UKSimAMSS 16th International Conference on Computer Modelling and Simulation, 2014. [2] G. M. M. a. M. E. A. Abdoli, "COMPARING THE PREDICTION ACCURACY OF LSTM AND ARIMA MODELS FOR TIME-SERIES WITH PERMANENT FLUCTUATION," Revista Gênero & Direito 9, 2020. [3] Y. Z. &. B. S. Jianlei Zhang, "Recurrent neural networks with long term temporal dependencies in machine tool wear diagnosis and prognosis," SN Applied Sciences, vol. 3, no. 4, pp. 3-4, 2021. [4] M. C. A. Sharaff, "Comparative Analysis of Various Stock Prediction Techniques," in 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), 2018. [5] E. G. J. H. E. C. Anqi Bao, Data-Driven End-To-End Production Prediction of Oil Reservoirs by EnKF-Enhanced Recurrent Neural Networks, 2020. [6] A. Sable, Introduction to Time Series Forecasting: Regression and LSTMs, PaperSpaceBlog, 2021.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 433-443 doi:10.4028/p-02laqd © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-20 Accepted: 2022-09-23 Online: 2023-02-27
Stock Market Ontology-Based Knowledge Management for Forecasting Stock Trading Dr M.Uma Devi 1,a*, Dr. Akilandeswari. P.2,b and Dr. Eliazer M.3,c Associate Professor, SRM institute of Science and Technology, Kattankulathur, Chennai, India.
1
Assistant Professor, SRM institute of Science and Technology, Kattankulathur, Chennai, India.
2
Assistant Professor, SRM institute of Science and Technology, Kattankulathur, Chennai, India.
3
[email protected], [email protected], [email protected]
a
Keywords: Artificial Neural Networks, Kernel Regression, Portfolio, Regression, Stock Market, Ontology
Abstract. Today’s markets are rather matured and arbitrage opportunities remain for a very short time. The main objective of the paper is to devise a stock market ontology-based novel trading strategy employing machine learning to obtain maximum stock return with the highest stock ratio. The paper aims to create a dynamic portfolio to obtain high returns. In this work, the impact of the applied machine learning techniques on the Chinese market was studied. The problem of investing a particular total amount in a large universe of stocks is considered. The Chinese stocks traded on Shanghai Stock Exchange and Shenzhen Stock Exchange are chosen to be the entire universe. The inputs that are considered are fundamental data and company-specific technical indicators unlike the macroscopic factors considered in the existing systems. In the stock market document repository, ontological constructs with Word Sense Disambiguation (WSD) algorithm improve the conceptual relationships and reduce the ambiguities in Ontological construction. The machine learning techniques Kernel Regression and Recurrent Neural Networks are used to start the analysis. The predicted values of stock prices from the Artificial Neural Network provided quite accurate results with an accuracy level of 97.55%. In this study, the number of nodes will be selected based on Variance-Bias plots by tracking the error on the in-sample data set and the validation data set. Introduction Trying to forecast individual stock prices is very hard as the idiosyncratic risk component adds excessive randomness to the data set. On the other-hand fundamental statistics such as demand, supply or more diversified assets such as ETFs and global macro factors are easier to forecast using Machine Learning. Past studies have shown fairly successful predictions for such trends, as these variables depend mainly on specific risk factors (systematic component). It has been planned to use unsupervised learning algorithms like Principal Component Analysis, factor analysis, and other nonlinear dimensional reduction techniques to successfully decompose the returns into independent risk factors. To predict and forecast these risk factors or macro factors, the work would be performed upon existing research that incorporates deep learning techniques for time series forecasting. The machine learning techniques that have been planned to be used are Kernel Regression and Recurrent Neural Networks. Since the data set for financial time series, especially asset returns are significantly large, the use of regularization techniques will be made along with deep learning models to prevent overfitting or under-fitting. Once this model is effective, will produce significant returns through back-testing. It ventures into the option space and includes options in the framework that will substantially magnify the portfolio returns. Kernel regression is a non-parametric method in statistics to evaluate the conditional probability of a stochastic variable. The aim of the technique is to generate a non-linear relation between two stochastic variables, say, ‘x’ and ‘y’. The main crux of kernel regression is assigning a set of homogenous weighted functions known as kernel restricted to each observational data point.
434
IoT, Cloud and Data Science
Depending upon the distance from the datum, the kernel will allocate weight to each location. The kernel weighted basis function is dependent upon the width or radius or variance from the locally confined data point ‘x’ to a set of adjacent locations x only. Kernel regression is inclusive of locally weighted regression and is very meticulously associated with the technical indicator - Moving Average. A computer system consisting of numerous basic, fundamental, and extremely reticulate operating elements which handle data by their active state response to external independent inputs is known as an Artificial Neural Network (ANN). It is composed of numerous nodes which are capable of taking fundamental data as input and executing basic operations on the input data. The output of the above implementations is carried to other neurons. The result at each of the nodes is known as its node or activation value. A weight is allocated to each link. ANNs have the ability to learn by changing the value of weights. A significant benefit of using Artificial Neural Networks is that they are capable of handling unresolved and sturdy data. Hence, ANN is ideally suited for stock markets and can be effectively used to predict the likes of stock prices and stock returns. The Chinese market has been chosen because its improvement or crash will have an immense impact on other countries. Since India is a developing country with an emerging economy and China is the second-largest economy in the world, the studies can be paralleled for the benefit of the Indian Stock Market. The remaining paper has been thought-out as follows. Section II gives the details of related works. Section III provides a methodology or description. Section IV describes the system implementation. Section V concludes the paper after the above sections by illustrating the experimental results. Related Works They considered the problem of how news summarization can help stock price prediction, where a generic stock price prediction framework was modeled to enable the use of different external signals to predict the stock prices. A summarization model was used to generate signals from news articles, which were then evaluated by whether they can improve the prediction of a stock’s daily return. Usmani and Adil [6] proposed the prediction of the performance of the market of the Karachi Stock Exchange (KSE) at the end of the day by employing machine learning algorithms. The attributes that were used in the model proposed by them were Foreign Exchange rates, Oil rates, gold and silver rates, News and data obtained from social media and the machine learning techniques used in this scenario were Support Vector Machine (SVM) and Neural Network classifiers such as Single-Layer Perceptron and Multi-Layer Perceptron. The result of the research was that oil rate was found to be the most successful attribute for the prediction of market performance and MLP was successful in predicting 77% of actual market performance. However, there was a resource insufficiency and nonexistence of market information. Pathare et al. [1] developed a strategy to construct a portfolio of stocks using machine learning approaches. Hamed et al. [4] used KullbackLiebler Divergence as a learning algorithm on data sets from the Egyptian Stock Market. Ichinose and Shimada [5] have demonstrated the role of the news on the web to predict stock prices where they have used criteria such as recall and precision rates. However, there is room for improvement in the future with the use of a potentially larger data set. The work by Gupta and Dhingra [2] is ideal for one-day trading however the accuracy of such predictions would decrease for swing trading. Desai and Gandhi [7] have also demonstrated the worth of data analysis and the role of non-quantifiable data in predicting changes in stock trends. Methodology The existing system uses macroscopic and generic factors such as News, Twitter data, Forex, Market History, etc. The comparison is conducted at sector index levels such as finance, commerce, etc. However, it has some disadvantages. The first disadvantage is that prediction from market history is unreliable for penny stocks. Just implementing technical analysis singly to evaluate the trading of penny stocks cannot be relied upon unless and until fundamental data reasserts the evaluation of the
Advances in Science and Technology Vol. 124
435
shares provided by technical analysis. For example, the organization might be facing bankruptcy, or facing problematic litigation, may be experiencing the peril of being removed from listing at the stock exchange, or reneging on endowing payments. The attributes stated above are found to be extremely recurrent with penny stock shares. Most penny stocks are faulty and fraudulent. The other disadvantage is that even a small piece of news is capable of changing the price of volatile stocks and with no technical indicators to warn beforehand of such moves which can be too late. The proposed system takes fundamental data and idiosyncratic company-specific technical indicators as inputs. Technical indicators such as Moving Average are used. A Simple Moving Average (SMA) can be computed by first calculating the summation of the final price of the stock for a number of time periods followed by the division of the total by the number of time periods. Fundamental data such as P/B (Price/Book value) ratio, P/S (Price/Sales) ratio, etc. are taken as inputs too. A proficiency score is computed and the higher this score, the greater the expected stock return. The main advantage will be by using the fundamental analysis and technical analysis to it. It is crucial for successful automated trading, swing trading, and day trading. Small and large capital stocks both can be traded with the help of fundamental analysis. It is especially useful for long-term investments. Neural Networks are good at handling uncertain and robust data therefore ideal for stock markets. Suppose a subset of 1000 (or 1500) stocks are selected from the entire universe. At every rebalancing time point, a small group of 100 to 150 stocks are selected out of the 1000 (or 1500) stocks during the in-sample back-testing period based on the following criterion for the long position: • The market capitalization needs to be no less than 500,000,000 (RMB). Market capitalization is the worth of the company and it is computed by multiplication of the number of stocks and the current price of the stock. • The average daily trading volume over the past 15 business days needs to be no less than 1,000,000. Trading volume is known as the total number of shares in a transaction or an entire market during a specified period of time. • Computing M-score based on a group of factors for each stock i which passes Criterion one and two. Let Ci(t) denote the price of stock i in period t. The group of factors is described below. Factor F1: Price to Book ratio (PB) Factor F2: Price to Cash Flow ratio (PCF) Factor F3: Price to Earnings ratio (PE) Factor F4: Price to Sales ratio: PS Factor F5: n-period momentum factor (PM) PMi(t,) = ln Ci(t−1)/Ci(t−n) ; (n = 5) Factor F6: m-period reversion factor (PRev) P Ri(t) = ln Ci(t−m)/Ci(t−1) ; (m = 20) Factor F7: L-period log-return volatility (Vol) Voli(t) = σiL(r(t − 1)), where σiL(r(t − 1)) denotes the annualized standard deviation of the log-return of stock i over time window[t − L, t − 1]. Choose a weight vector w ≡ (w1, w2, · · ·, w7). For example, w = (0, 0, 0, 0, 0.5, 0.5, 0) or w = (1/7, 1/7, 1/7, 1/7, 1/7, 1/7, 1/7). M-score for stock i is calculated as: Μι(τ) = ω1.Fι1(τ) + ω2.Fι2(τ) + ... + ω7.Fι7(τ).
(1)
The higher the value of this resultant score the higher the expected return on the stock. Hence, the machine learning algorithms compute the weights, and ultimately the score. Algorithm: Algorithm for Stock market prediction 1. for each i in the range of tickers do 1.1 fetch fundamental data from Yahoo 1.2 compute values of PM, PRev, Volatility PMi(t,) = ln Ci(t−1)/Ci(t−n) ) ; (n = 5)
436
IoT, Cloud and Data Science
P Ri(t) = ln Ci(t−m)/Ci(t−1) ; (m = 20) Voli(t) = σiL(r(t − 1)) where σiL(r(t − 1)) 1.3 store above in a .csv file end 2. for each i in range of business days do 2.1 call signalGenerator() 2.1.1 Take factors as input (‘x’) and output as ‘y’ 2.1.2 Perform regression by OLS(y,x) 2.1.3 Initial weights for ANN taken between (−1√d, 1√d), where d is number of inputs 2.1.4 Derive probabilistic output from ANN 2.1.5 Compute m score and return parameters 2.2 generate portfolio of 100 best stocks for every 10 days (rebalancing period) end System Implementation The overall architecture of the system is shown in Fig. 1. The system is fragmented into a number of segments that consists of Data Processing, Ontology construction of the stock market, the Trading Signal Generator, and Portfolio Management. The Trading Signal Generator is implemented using a Machine Learning Algorithms proficiency score calculator. A. Data Processor The collection and manipulation of items of data to produce meaningful data is known as Data Processing. This paper demonstrates the use of Pandas or Python Data Analysis Library which is a library that is freely available and provides highly efficient, convenient data structures and handy tools for data analysis for Python language. Pandas have advantages over other data analysis tools as it makes it easier for one to keep track of the data. It also has built-in functionalities for a lot of common data processing applications, for example, easy grouping by syntax, easy joins, etc. It also has good I/O capabilities. With pandas, one can easily use patsy for R-style syntax in doing regressions, etc. referring to variables by column name. Data Frame is a 2-dimensional data structure that is labeled and consists of columns of possibly different types. It is quite similar to a spreadsheet or SQL table, or a dictionary of Series objects. It is one of the most customarily employed Pandas objects. A common aspect of this Series is that Data Frame also accepts various types of input. It is used for reading the stock data from a .csv file and is responsible for fetching the Chinese Yuan exchange rates from the web. This is then stored in a tabular format. The risk-free interest rates are fetched next. The risk-free rate of return is known as the hypothetical rate of return on an investment with absolutely no risk. The risk-free rate is the interest expected from a zero-risk investment over a specified period of time. This is also stored in a second tabular format file. The tickers are fetched from the Yahoo Finance Website and stored in a Data Frame. A stock ticker is known as a report of the prices for specific bonds, updated uninterruptedly throughout the trading session by the various stock exchanges. A "tick" is any variation in price, whether that movement is upwards or downwards. In order for the investors to be aware of the present market situation, a stock ticker automatically shows these ticks together with pertinent data like volume. Finally, Data Scrubbing is carried out where the unnecessary data is removed. B. Stock Market Ontology Construction The market to buy or sell shares is a Stock Market. In the stock market document repository, ontological constructs with Word Sense Disambiguation (WSD) algorithm improve the conceptual relationships and reduce the ambiguities in Ontological construction. The main purpose of this Ontology is to describe the elements (terms and vocabulary), their relationships, and the rules of the given domain, and to describe the knowledge in a generic way. The Ontology (domain ontology of
Advances in Science and Technology Vol. 124
437
the Stock Market) was the backbone of all components of Stock Market operations. Financial Ontology was designed to be modular enough to allow refinements in the context of the current domain and extensions to other domains in the financial area. In order to find what are the most commonly used terms and pieces of information that banks provide on their websites, we have studied some of the most representative ones. Not all banks have public information about the Stock Market on their websites, so we have had to discard some of them. To have a broad vision of the market for this ontology and to make it more powerful, we have included in our research some Chinese and international independent stock market services (i.e.: Yahoo Finances, Reuters, and Invertia), since they are increasingly used by costumers and, usually, offer more detailed information. Stock Market Ontology which is part of financial Ontology was developed by considering the following. 1. It considered almost all concepts and operations that can be accomplished in Stock Market. 2. It was establishing relationships between all the concepts available from the same point of view, with special attention to the possible combinations of information that a Stock Market user can perform. Term: Stock Market The market to buy or sell shares. Attributes for this concept hasStocks hasIndexes hasName hasCurrency hasBrokers hasCountry hasSessions Term: Depositary: Society or person that holds clients’ stocks Attributes for this concept hasBroker. hascommisions. DepositaryID. C. Trading Signal Generator A particular kind of software that applies technical analysis procedures to price action charts and notifies currency traders of the chance of a transaction is known as Trading Signal Generators. This kind of trading software is quite similar to computer programmed foreign exchange trading software because it doesn’t make an entry or exit from the trade. Similar to technical analysis software, Signal generators track charts, however in addition they also give advice to the users on entry and exit. Hence it is beneficial in reducing the pressure on the trader. Company-specific fundamental data is provided as input. Technical indicators are computed and also used as input. Machine learning algorithms are applied to them. The Regression algorithm gives numerical output, that is, M-score. The generator shortlists 100 out of 1000 stocks for the next 10 days (rebalancing period). Rebalancing is defined as the method of reorienting the weights in a portfolio of resources. Rebalancing includes buying or selling assets in a portfolio to maintain an original desired level of asset allocation at regular intervals. The generator is composed of 2 sub-modules which are the Machine Learning Module and the proficiency score calculator module.
438
IoT, Cloud and Data Science
Fig. 1. Overall System Architecture Machine learning algorithms Kernel functions generally contain adapting criteria. The values of the tuning parameters are very important for the efficiency of the kernel methods. If a kernel function is provided, the corresponding parameter values can be adjusted by the trial-and-error method or by heuristic search. The major issues in financial market prediction are not only to tune the parameter for a particular kernel function but also how to combine different kernel functions so that the traders make improved decisions. For example, traders frequently monitor and keep track of data from different origins, and such data is inclusive of technical indicators and politics, etc. Each data origin can be used to obtain a kernel function. Furthermore, an obvious question is: What is the way to integrate the kernels or data to facilitate the trader in making improved prediction decisions. Kernel learning or kernel selection is used to refer to the aforementioned problem. Conventional kernel procedures depend upon one singular kernel function. Contrarily, kernel selection frequently looks for a linear composition of prospective kernels. Various kernel functions, data sources, and parameter settings can be used to obtain the prospective kernels. The OLS (Ordinary Least Squares) method has been used for forecasting the unspecified parameters in the regression model. In OLS, the objective is to first compute the differences between the discerned responses in the provided dataset and then square them and calculate their summation and minimize it. The fitted regressed values have been derived by minimizing the sum of squared residuals as depicted below. Daily returns are taken along ‘y’ and whereas fundamental data such as ‘PB’, ‘PS’, ‘PCF’ etc. are taken along ‘x’. The beta or slope values are obtained by using the params function. The gist of the result is printed using the summary function. The weights are computed by dividing each element in the list of params by the sum of the result. The function also returns the values of certain parameters to the portfolio management module and is depicted in Fig.2. In linear regression, the response variable can be depicted with the help of a linear function of the regressors: where beta is a vector of unknown parameters and the scalar represents the discrepancy between observed and predicted values.
Fig.2. Artificial Neuron Structure Over the past decades, Artificial Neural Networks have drawn significant interest from several researchers in the prediction of stock prices. The ANNs are robust in model specification compared to parametric models, which makes it quite often applied in forecasting stock prices and financial
Advances in Science and Technology Vol. 124
439
derivatives. The neurons are arranged in the form of layers in feed-forward ANN. In a particular layer, the neurons get input from the preceding layer and provide their result to the subsequent layer. Connections to the neurons in the same or preceding layers are not allowed in this type of network. The layers present in the midst of the input and output layers are known as hidden layers and the last layer comprising neurons is known as the output layer. The input layer transmits the employed external independent input only to their outputs as they consist of special input neurons. A network is known as a single layer network if there is only a single layer of input nodes and a single layer of output nodes. The networks are known as multilayer networks if there are multiple hidden layers. Recurrent networks are known as those networks in which connections to the neurons of the preceding layers or to the same layer are permitted. Back-propagation algorithm is used for ANN. The main aim of this algorithm is to reduce the error function. Weights used in ANN are altered on a pattern-by-pattern basis and are depicted in Fig.3. Weights are updated corresponding to the respective errors calculated for each pattern provided in the network. In order to decrease and minimize the error, a gradient descent strategy is employed. Batch learning is used where all training examples are provided to the network and each input example is used to calculate the alteration in weights. The weights are changed in the end corresponding to the sum of all alterations.
Fig.3. Artificial Neural Network Proficiency score calculator The proficiency score depicts whether the stocks will have high returns. The higher the score, the higher will be the return on those stocks. One type of such proficiency score is M-score. It is computed with the help of a formula that uses weights predicted by the above machine learning algorithms and several factors such as Price/ Book Value ratio, Price/ Earnings Ratio, Price/ Sales Ratio, Price/ Cash Flow Ratio, etc. This is executed after every rebalancing period and its outputs are used by the portfolio management to update the new changes. D. Portfolio Manager A portfolio is known as the grouping of financial assets in the form of stocks, bonds, and cash equivalents, along with their fund counterparts, including mutual, exchange-traded, and closed funds. Portfolios are supported by investors and are maintained by financial officials. It is widely suggested that investors should build an investment portfolio by keeping track of risk tolerance and investing objectives. An investment portfolio can be interpreted as a pie that is divided into pieces of varying sizes, representing a variety of asset classes and types of investments to achieve a suitable risk-return portfolio allocation. Various kinds of securities are used to construct a diverse and expanded portfolio, but stocks, cash, and bonds are usually considered a portfolio's essential building blocks. A list of the 100 best stocks is obtained from the trading signal generator module. Hence the existing 100 stocks are compared with the new 100 stocks predicted by the trading signal generator. In comparison, the stocks that are common to both lists remain the same, that is, they can be invested in as they are guaranteed to provide a high return. The stocks that are present in the old list but are
440
IoT, Cloud and Data Science
absent from the new list are sold and the money obtained from the selling of these stocks is used for buying the stocks that are present in the new list but are absent from the old list. A rebalance frequency is chosen to be every U period. U can be set to a number between 1 and 20. This means while the M-scores are computed for all the periods from t = 0, 1, 2, · · ·, TN, one can choose to rebalance the portfolio only at time points: 0, U, 2 ∗ U, 3 ∗ U, · · · to avoid incurring excessive transaction costs. At each time point t, select the best K stocks with the highest M-score. Initially, choose K = 100. Put 1% of the total account value at time t into each of the 100 stocks. Transaction cost is T Cost = 0.001 of the total dollar amount of transactions. Experimental Results and Decision On carrying out a regression algorithm upon the input data, the following summary is obtained and the detailed summary output is shown in Table1. It has been observed that the greater the number of observations greater will the R-value be. Table 2 shows the Regression and Residual value for the sample input data. Table 3 shows the different X variable values obtained while testing. Upon forming a scatter plot for the above data in Table 3, the following graph in Fig. 4 for one of the X variables is obtained. Table 1. Summary Output Regression Statistics Multiple R 0.077253 R Square 0.005968 Adjusted R Square 0.005646 Standard Error 0.026734 Observations 21620 Table 2. Regression and Residual value for the sample input data df SS MS Regression 7
0.092736
0.013248
Residual
21612 15.44623
0.000715
Total
21619 15.53897
0.013963
Intercept X Variable 1 X Variable 2 X Variable 3 X Variable 4 X Variable 5 X Variable 6 X Variable 7
Table 3. Different X variable’s values Coefficients Standard t Stat Error 0.000616 0.000335 1.839308 4.51E-06 2.44E-06 1.8474 -2.8E-09 3.49E-08 -0.08051 -5.5E-09 2.22E-08 -0.24802 0.031281 0.002819 11.09646 0.002281 0.001257 1.814393 1.08E-05 3.6E-06 2.998028 0.003426 0.010035 0.341369
P-value 0.065878 0.064698 0.93583 0.804123 1.46E-28 0.069626 0.002719 0.732828
Advances in Science and Technology Vol. 124
441
Fig.4. Graphical representation of regression analysis .
The Artificial Neural Network provides probabilistic output. The ones with higher probability are the stocks that are worth investing in. The initial weights are chosen with the help of a formula and then they are fed forward into the neural network. The ANN is used to predict the price of the next day after being fed with the prices of a particular period. By gradient descent strategy, it is seen how the error is minimized in each iteration as follows and the predicted price is displayed at the end and is shown in Fig. 5.
Fig.5. Artificial Neural Network Result Upon analysis of actual and predicted results, it was observed that ANN provided 97.55% accuracy in its prediction. Ultimately, a portfolio of the best stocks has been obtained in the form of their summaries. Figure 6 shows the sample of only 5 out of the best 1000 stocks:
Fig. 6. Sample portfolio The performance measure of the neural network model lies in the prediction accuracy of the model with the sample data. Normalized Mean Square Error (NMSE) is used for evaluating the prediction accuracy of the model. In order to test the performance of the direction of the predicted value Sign Correctness Percentage (SCP) is used. The Performance of the Indian Stock Market, Chinese Stock market, and Stock market with Financial Ontology is compared with the proposed system. The SCP and NMSE values of the different Stock markets are shown in Table 4. The Performance analysis in terms of Sign Correctness Percentage (SCP) is depicted in Fig. 7 and the Performance analysis in terms of Normalized Mean Square Error (NMSE) is depicted in Fig. 8. It is observed from Table 4, Fig. 7, and Fig. 8 that the performance in terms of Sign Correctness Percentage is good in the proposed system, and Normalized Mean Square Error values for the proposed system are considerably reduced.
442
IoT, Cloud and Data Science
Table 4. Different Stock markets and their SCP and NMSE values Jan'11-Mar'11 Apr'11-Jun'11 Jul'11-Sep'11 Oct'11-Dec'11 SCP NMSE SCP NMSE SCP NMSE SCP NMSE Indian Stock Market 82.76 0.27 73.26 0.21 85.33 016 82.76 0.24 Chinese Stock market 68.97 0.58 77.31 0.38 78.21 0.33 63.19 0.32 Stock market with 72.58 0.31 62.23 0.23 71.15 0.45 63.78 0.31 Financial Ontology Chinese Stock market 85.39 0.19 81.20 0.13 85.37 0.16 84.39 0.17 with Ontology
Fig. 7. Different Stock market Vs Sign Correctness Percentage values
Fig. 8. Different Stock market Vs Normalized Mean Square Error values Conclusion It is well known that machine learning is very much in demand by many firms for stock trading and this paper depicted how the Stock Market Ontology, Kernel Regression, and Artificial Neural Network algorithms were successfully used to predict prices and construct stock portfolios. The predicted values of stock prices from the Artificial Neural Network provided quite accurate results with an accuracy level of 97.55%. This study on the Chinese stock market can in the future also be paralleled for the benefit of the Ontology-based Indian stock market.
Advances in Science and Technology Vol. 124
443
References [1] Aarish Pathare, Siddhartha Gade and Ted Hwangm, “Exploring Trading Strategies Using Machine Learning”, Georgia Institute of Technology, (2016). [2] Aditya Gupta and Bhuwan Dhingra, “ Stock Market Prediction Using Hidden Markov Models”, IEEE. doi: 10.1109/SCES.2012.6199099, (2012). [3] Aditya Bhardwaj, Yogendra Narayan, Vanraj, Pawan and Maitreyee Dutta , “Sentiment Analysis for Indian Stock Market Prediction Using Sensex and Nifty “, Proceedings of the 4thInternational Conference on Eco-friendly Computing and Communication Systems , Procedia Computer Science ,70 , 85 – 91, (2015) . [4] Ibrahim M. Hamed, Ashraf S. Hussein and Mohamed F. Tolba ,” Intelligent Model for Stock Market Prediction”,. Computer Engineering and Systems. doi : 10.1109/ICCES.2011.6141021, (2011) . [5] Ko Ichinose and Kazutaka Shimada , “Stock Market Prediction from News on the Web and a New Evaluation Approach in Trading”, Advanced Applied Informatics. doi : 10.1109/IIAI-AAI.2016. 157, 77 – 81, (2016) . [6] Mehak Usmani and Syed Hasan Adil , “Stock Market Prediction Using Machine Learning Techniques”, Proceedings of the 3rd International Conference on Computer and Information Sciences (ICCOINS), (2016) . [7] Ruchi Desai and Prof. Snehal Gandhi , “ Stock Market Prediction Using Data Mining “, International Journal of Engineering Development and Research. Vol. 2, Issue 2, pp.27802784(2014) . [8] Tao Xing, Yuan Sun, Qian Wang and Guo Yu, “ The analysis and prediction of stock price”, Proceedings of the IEEE International Conference on Granular Computing (GrC) , doi : 10.1109/GrC.2013.6740438, (2013) . [9] Xiaodong Li, Haoran Xie, Yangqiu Song, Shanfeng Zhu, Quing Li and Fu Lee Wang , “Does Summarization help stock prediction? News impact analysis via Summarization”, IEEE Intelligent Systems, vol. 30, pp. 26-34, doi:10.1109/MIS.2015.1, (2015) . [10] Protégé and Protégé OWL extensions. http://protege.stanford.edu/overview/protegeowl.html.
CHAPTER 4: Machine Learning on Other Types of Datasets
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 447-456 doi:10.4028/p-0u1a42 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-26 Accepted: 2022-09-16 Online: 2023-02-27
Train Track Crack Classification Using CNN Manish Kumara, Dr.P.Visalakshib Department of Networking and Communications SRM Institute of Science and Technology Kattankulathur, India [email protected], [email protected]
a
Keywords: Train crack classification, deep learning, Tensorflow.
Abstract. This research proposes a railway crack detection system. This research describes a classification system that can classify any crack in the railway tracks by using deep learning with convolutional neural networks (CNN). In the railway network of the Indian railways, accidents are one of the major concerns due to the unidentified cracks that are available on the rail tracks. The majority of accidents occur due to railway track cracks, resulting in the loss of precious lives and economic loss. So, it has become necessary to monitor the health condition of the track regularly by using a train track crack classification system. This project prevents the train derailment by classifying cracks in the railway tracks using image processing technologies. To identify the train track crack classification system that uses deep learning with Convolutional Neural Network architecture of different layers along with certain image pre-processing methods has been very successful in the classification of railway track crack has occurred or not. In convolutional neural network, there are a lot of layers available where training of the images are done which are available in the dataset and these layers are made up of lots of neurons. So its have been found that these convolutional neural networks are considered to be able to record the colours and textures of lesions related to corresponding railway track cracks, which is similar to human decision-making. I.
Introduction
Deep learning is part of machine learning that contains of neural networks of many layers which are hidden and have their algorithms. Since the growth of science and technology in the last several decades, everything has been digitalized, and as a result, we have a lot of data available, deep learning has become increasingly significant. So, in deep learning basically we have the input in the form of data, information and images and the output which is to be calculated but what we don’t have is the algorithm, so we have to train our neural network. We have used Convolution Neural Network(CNN) in our project to classify the cracks of the train tracks. This CNN is basically an algorithm of deep learning which take input in the form of images and then weights and biases are added to the images until our neural network is trained. In the training process of the convolution neural network, the input images of train tracks are given, it follows through a lot of hidden layers. Then the output is predicted, then we calculate the difference between the predicted value and the original value. From the loss we get, we will know how well the model has performed. This loss value is calculated using the loss function. So in order to minimize the loss value on the value we got, all the weights and biases of the neural network is modified. We continue this process until the output has similar values of that the original output. Major motivation of the project was to understand the concepts of deep learning with convolutional neural network and their architectures. To study the different architectures that are used in recent times for the classification of the images. The main idea was to classify the images of the train track cracks images which can prevent train derailment and also can provide an monitoring methods on train tracks all over the country. The objective of this project is to develop a deep learning model for the classification of train track crack images with the help of convolutional neural network algorithms. Then classifying the output of the different CNN architectures which has the best accuracy by comparing them.
448
IoT, Cloud and Data Science
We have discussed the existing system that is available to determine the train track cracks and their drawback, then we have the literature survey of different journals available related to the topic. In the proposed system, we have mentioned our idea to implement the project and their advantages. Descriptions of the dataset have also been described followed by the system architecture and their process. Important terms have been defined. Then the implementation part of the train track crack classification have been described and all the architectures of convolutional neural networks. We have added the results and the accuracy of all the CNN architectures. Conclusion and future works are also mentioned. II. Existing System The placement of the train is very important in order to maintain the train safety. In this system, they recommended a train situating strategy that wires vision and also the millimeter-wave radar information. There recommended strategy have two parts, first one was the loop closure detection which is also called as LCD and the second was the radar-based odometry. In this method, the loop closure detection part of the system along with the features of convolution neural network and the line features, they got the detection of right key location and it also reduces any error that occurs in the process to find the key location. The second part which is the radar-based odometry, helps to find an algorithm to measure the speed of the train. So, the output we get from this is combined with the output of loop closure detection in order to determine the positioning of the train. Drawback: In their system they have not focused on increasing the recognition speed and also the classification accuracy of severity of the train track crack. III. Literature Survey Review of Literature Survey A. Analysis of cracking on the running surface of rails In the Montenegrin railway system, managers who are in charge of the railways infrastructure should have a plan in order to maintain the safety of the infrastructure subsystem of the railway network [1]. This plan must be focused more towards the evaluation and suitable technique to prevent the train cracks from rolling contact fatigue. These fatigue cracks are one of the major reasons for most of the rail infrastructure damage and train derailments. So, these fatigue cracks not only increases the cost of maintenance but also reduces the service life of the railway tracks. Increased traffic density is one of the major reasons that contribute to the rolling contact fatigue and is a serious issue for the railway managers and the department. The problems caused by rolling contact fatigue can be reduced with the help of track architecture which are perfect and also with improved maintenance process. These improved maintenance process will help the rail service to have a longer life. Also it reduces the railway infrastructure maintenance costs and ensures more safety of railway traffic. B. Automatic Crack Detection and Classification Method for Subway Tunnel Safety Monitoring Railway infrastructures safety and maintenance status can be detected with the help of cracks of the railway tracks. In this journal they suggested a crack detection system which can automatically detect the cracks and classification technique with the help of these methods they can manage the safety of the subway tunnel. In this they used a complementary metal oxide semiconductor (CMOS) cameras to automatically detect the cracks [2]. So, after this the feature extraction process takes place, in this the gray-scale images which had the cracks are identified using various image processing techniques and plotted a histogram in order to differentiate between the image that had the crack and the other that did not had the crack. So 90% of the part of the gray-scale pixel images which did not had the cracks were removed in order to maintain the accuracy. The final binary
Advances in Science and Technology Vol. 124
449
output images had over 90% of the crack length preserved. After testing of the recommended technique, it was found that the approach to be very effective and had a good accuracy for automatic crack detection and classification. C. Crack detection using image processing: A critical review and analysis In this journal they estimated that to calculate the structural damages on the concrete surfaces of the railway tracks is to identify the cracks which are very crucial to manage because these cracks if not managed regularly can damage the ecosystem. In order to monitor these cracks one of the most basic method they decided is to check these cracks manually, and then drawing of the cracks are constructed to check all the irregularities and to keep a record of these irregularities [3]. But these cracks drawings did not had the statistical information. In order to remove this manual method, an automatic crack detection method was introduced that were based on the images of the cracks. They used different image processing methods that automatically helped to detect the cracks. A detailed review was conducted on these methods to identify the challenges they will have in future and also the achievements. They reviewed a lot of journals based on the crack detection and these journals were evaluated to find out the detailed information based on the different image processing methods and gave their observations and methods that can be used in the future to identify the cracks. IV. Proposed System So in order to classify the cracks of the train tracks we have used convolutional neural networks (CNN). Convolutional neural networks have mostly being used in the object classifications. There are many image recognition systems that are available that uses standard deep learning methods and extracted features. These image recognition systems don’t have good performance during the testing process where these systems are applied to the test images which are not available during the training process and the system gives less accuracy. Here we have implemented three different architectures of convolutional neural network, These three CNN architectures are : ManualNet CNN architecture, AlexNet CNN architecture and LeNet CNN architecture. So, in the training process the train track images are given to the neural networks, these neural networks have convolution layers and the images goes through these layers and at the same time the internal parameters such as the weights and biases are updated to get the best accuracy. Each layer is connected to the previous layer and finally we have an output layer of 2 neurons because it will predict that is it a defective train track image or non-defective train track image. The final layer is basically a probability distribution and the one with the highest probability is the output. We perform these process in all the three convolutional neural networks and then the one with highest accuracy is selected to predict the train track crack image. Advantage: • CNN alogrithms are essentially utilized for image classifications so it performs better. • The accuracy of the CNN architectures can be improved by utilizing a mix of layers. V. preparing the dataset This dataset contains approximately 300 train and 20 test images, which were then classified into 2 classes: • DEFECTIVE • NON-DEFECTIVE
450
IoT, Cloud and Data Science
VI. System Architecture
Fig. 1. System Architecture diagram So, first we import our dataset using the keras preprocessing image data generator function. The training process involves feature extraction, in the feature extraction process the images of the train tracks are processed of the architectures of convolutional neural networks. Then these architectures of convolutional neural networks performs the feature extraction on our images, in which convolutions and max pooling are performed to reduce the dimensionality of the images. We also perform data augmentation to the images so that our neural network don’t memorize the image and to avoid overfitting. We used various activation functions in training like the rectified linear unit. Then our neural network performs feature selection on the feature extracted images in which the images are classified over the fully connected layers and predicts the output that is it a cracked image of train or not. Then the loss value is calculated using the loss function by comparing the test image to the predicted image. So in order to reduce the loss, the internal parameters of convolutional neural networks layers are modified after each time the loss is calculated. And this is called the back propagation. And this continues until our neural network gets the best accuracy. So, in order to get the best accuracy we have used three different architectures of convolutional networks. First is the ManualNet, second one is the AlexNet and third one is the LeNet. Then after the training process is over, we test our image from test image dataset which our convolutional neural network has not seen before, this is done so that our neural network don’t memorize the images and then it predicts the output that our train image is cracked or not. Then we deploy this using the django framework. In this module the trained deep learning model is converted into hierarchical data format file (.h5 file) which is then deployed in our django framework for providing better user interface and predicting the output whether the given image is normal or cracked train track. So, in the django framework, user will be asked to add the image file from the folder of the dataset which we have to predict. If no file is selected then it will show a message of no file selected. After the image is added, then our model will predict the state of the train track image that is if it is defective or not defective. VII. Convolution, Max Pooling and Fully Connected Layer Convolution: Convolution is a very important step in the neural networks. Convolutions helps to reduce the size of the original image and converts into a convoluted image. In convolution layers it creates a kernel or filter of matrix of the specified dimensions. Then we scan this kernel across whole image. This kernel multiplies the whole image to its corresponding values of the kernel and then we sum up the whole values. This process continues until the kernel has moved over the whole image. Then we
Advances in Science and Technology Vol. 124
451
obtain a convoluted image which is smaller than the original image without losing any pixel information. Max Pooling: So max pooling is a type of pooling which takes the maximum value in a filter region. To perform a max pooling we use a grid which is the pool size and a stride. We move the grid over the image and select the maximum pixel value in that particular region and then stride moves the grid across the image. We continue this process and the result will be a new image that’s smaller than the original image. Fully Connected Layer: Fully connected layers occur before the convolutional neural network’s final output layer. These layers helps to convert the 2d image pixel into images of one dimensional array. VIII. Implementation of the Train Track Crack Classificatoin We have used three different architectures of convolution neural network to perform the train track crack classification. The three convolution neural networks are: 1. ManualNet 2. AlexNet 3. LeNet 1. Manual Net: This is a manual neural network created. The Manual Net starts with one convolution layer followed by the max pooling layer. In this filters (3,3) being applied to the image and it creates 32 convoluted images of the same size of the input image. After that, these 32 convoluted images are reduced in size with the help of MaxPooling2D with a grid size of (2,2). The activation function used here is rectified linear unit. Now, these are converted into 1d-array images pixels using the flatten layer. The dense layer is the fully connected layer with 38 neurons and rectified linear unit as the activation function. In the dense layers, the neurons weights and biases are updated. The final layer is also a dense layer with a softmax activation function. So, this layer takes 38 inputs from the previous layer and outputs the value in the range of 0 to 1, which represents the probability distribution of the train track image is defective or not defective. Our training data consists of images with width of 150 pixels in height and 150 pixels in width. Batch size of 32 is used and epoch of 50.
Fig. 2. ManualNet model summary
452
IoT, Cloud and Data Science
From the figure, it is the model summary of ManualNet. So, in this total parameters trained are 18,698,228 during the training process and among these total trainable parameters are 18,698,228 and Non-trainable parameters is 0.
Fig. 3. Plot of model accuracy and model loss of ManualNet architecture. From the plot of model accuracy values, the accuracy of training data is 93% and it is increasing . Also, the plot model loss, we can see that the loss is decreasing as the number of epochs are increasing. 2. ALEXNET: AlexNet was the first convolutional network to employ the GPU to boost performance. AlexNet is the convolutional neural network which had a large impact in the development of machine learning, most importantly in the field of deep learning to machine vision. AlexNet won the ‘Image Classification Challenge’ in 2012 in which this network was trained over 1.2 million images that were of very high resolution with the help of two GPUs. AlexNet is a 8 layer convolution neural network. In this, first 5 layers are convolution layers and some of which are followed by maximum pooling layers. And then we have 3 fully connected layers. AlexNet uses Rectified Linear Unit as the activation function which is used to solver non-linear problems. AlexNet has a major issue of overfitting. So in order to reduce this overfitting we used two techniques that were data augmentation and dropout. In Data Augmentaion , we apply various image transformation to the images present in the training dataset. These image transformations are image rotation by certain angles, flipping the image, applying a zoom to the original image, etc. This ensures that our cnn architecture donot memorize the image and in this way data augmentation reduces the overfitting. Dropout: During the training of our model, some of the neurons have very large weights and some of them are having small weights. In order to reduce this, some of the neurons are removed during the training of the neuron network. And this helps to reduce the overfitting. It is our second cnn architecture used in this project to classify the train track’s cracks.
Advances in Science and Technology Vol. 124
453
Fig. 4. AlexNet model summary So, in AlexNet architecture, the total number of parameter were 31,185,210 and total trainable parameters were 31,185,210 and non-trainable parameters were 0.
Fig. 5. Plot of model accuracy and model loss of AlexNet architecture. From the plot, we can see that the accuracy is 50% for 50 epochs and the loss is also decreasing as the number of epochs are increasing. 3. LENET: Lenet is the third convolution neural network used in our project. Lenet is a 5 layer cnn architecture, which consists of 2 convolutions layers followed by max pooling layers, then we have one flatten
454
IoT, Cloud and Data Science
layer and two fully connected layers. Convolutions and the max pooling are performed on the images. Activation function used in lenet is rectified linear unit and final layer has a softmax activation function which is used to provide the probability distribution of each possible output. In our case, it gives the probabilities that whether our train track image is a cracked image or a good image.
Fig. 6. LeNet model summary So, the total parameter were 1,218,306. Total trainable parameters were 1,218,306 and non-trainable parameters were 0.
Fig. 7. Plot of model accuracy and model loss of LeNet architecture. In LeNet architecture we got an accuracy of 98% and the loss has been reduced. IX. Result Since we got 98% accuracy in LeNet architecture, we predicted the output using LeNet by using the image in the test dataset.
Advances in Science and Technology Vol. 124
455
Fig. 8. Final output As you can see that our model predicted that it is a non defective train track image. So, our model is giving the best output to the corresponding test images. Then we deployed our model using the django framework. Then we will upload the test image, and then our model will predict the output that is it a defective or not defective image.
Fig. 9. Django deployment of the convolution neural network X. Conclusion So we focussed on the train track crack images from the training dataset and test dataset to predict the pattern of the train track crack using the various architectures of convolutional neural networks. These convolutional neural networks architectures have the ability to automatically
456
IoT, Cloud and Data Science
classify the images and it is one of the major advantages of using CNN for the classification of the images. In this study, we have discussed the overview of methodologies for detecting the abnormalities in track images which includes collection of train track images from the dataset, preprocessing techniques, feature extraction techniques and classification schemes. XI. Future work • •
Railway department needs to computerize the early detection of train track cracks in real time. Artificial Intelligence environment should be used to implement the work.
References [1] [2] [3] [4] [5]
Zdenka Popović, Vlatko Radović, “Analysis of cracking on running surface of rails”, Građevinar, Vol. 65,No. 03, Pg. 251-259, 2013. Wenyu Zhang, Zhenjiang Zhang *, Dapeng Qi and Yun Liu, “Automatic Crack Detection and Classification Method for Subway Tunnel Safety Monitoring”, Sensors,Vol. 14, No. 10, Pg. 19307–19328, 2014. Arun Mohan, Sumathi Poobal,"Crack detection using image processing: A critical review and analysis", Alexandria Engineering Journal, Vol. 57, Issue 2,Pages 787-798, June 2018 Laxmi Goswami, “Railway Route Crack Detection System”, International Journal of Innovative Technology and Exploring Engineering, ISSN: 2278-3075, Vol. 8, Issue-12S, October 2019 Rijoy Paul, Nima Varghese, Unni Menon, Shyam Krishna K, “Railway Track Crack Detection”, International Journal of Advance Research and Development, Vol. 3, Issue 3, 2018
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 457-461 doi:10.4028/p-4r39t2 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-09-14 Accepted: 2022-09-16 Online: 2023-02-27
Crime Prediction Using Machine Learning Algorithms Ganesh Meenakshi Sundaram Sa, Manda Bala Rama Phani Sujithb, Aravindh Kumar V.c, Dr.Durgadevid Department of Computer Science and Engineering, SRM Institute of Science and Technology, Chennai, India [email protected], [email protected], [email protected], d [email protected]
a
Keywords– Crime prediction and analysis, KNN, Random forest, Decision tree, Data visualization.
Abstract. In a recent survey, its observed that there is a rise in crime rate in India, and due to this, many people feel unsafe in this country. So to reduce it, predicting the crime before it happens is very important. Indian Government uses a software called CCIS (Crime Criminal Information System), this software is only used to store information, but with the stored information, it doesn’t do anything else, the existing systems predict crime only on a day basis, but it doesn’t predict exactly at what hour is it going to occur, to predict it on a hour basis, accuracy is important, so to analyze and predict crime, the accuracies of machine learning algorithms such as KNN, Decision tree and Random forest are compared in order to use the best for analysis and prediction. Introduction The rise in crime rate makes our country an unsafe place to live. Cases of crimes like murder, robbery, kidnapping and gambling have greatly increased. Machine learning and deep learning algorithms can reveal the hidden patterns in unstructured data sets. Since there are a tremendous amount of data related to crime, Crime identification and prediction are the major problems for the police department. To solve this issue, there is a need for technology to predict crimes to prevent them. The recent developments in machine learning make the task of sorting a huge amount of crime related data to predict crime a possible thing. Time, date, and the geometric location will be given as input and the output will give us the information on which type of crime is likely to occur in the given location at the given time. To use a machine learning algorithm with highest accuracy, algorithms such as KNN, Decision tree and random forest are compared with each other to use the best, and boxplot from a plotting library called matplotlib is mainly used for data visualization. In this paper, its chronological order is, Section 2 discusses about previous works that are related to this project, Section 3 contains the working of this project and Section 4 contains the results of this project. Proposed work The machine learning kernel creates a function based on the dataset given, so when the user inputs new information (location and time), the machine learning kernel uses the created function to predict the crime and it displays it to the user, the below diagram (Fig.1) clearly explains it
458
IoT, Cloud and Data Science
Figure 1: System Architecture Technical Modules: There are 3 modules in this project, they are: 1. Data collection and preprocessing 2. Data visualization and analysis 3. Creating and training machine learning models Data collection and preprocessing: After obtaining the dataset from the police website of Indore city in Madhya Pradesh with six different types of crimes and with more than 2000 records with time stamp and location, the data set is trained by removing missing values and features such as description and address since description doesn’t help us with crime prediction and for address, we converted it into latitudes and longitudes. We also classified the existing timestamp by year, month, day, hour and minute. Data visualization and analysis: These are the following steps used for this module: • Analysing data • Comparing no.of each crime per hour • Analysing crime pattern • Analysing the peak hours of the crime • Analysis of which type of crime occurs the highest compared to others Creating and Training machine learning models: The data set is trained and tested, after this process, we create and train models such as KNN, Decision tree and Random forest. Then the accuracies of said models are compared to check which one projects higher accuracy, the model which shows the higher accuracy is then used for crime prediction.
Advances in Science and Technology Vol. 124
459
Figure 2: Sample Dataset Machine Learning Algorithms: K-Nearest Neighbour Classifier is a supervised machine learning algorithm useful for classification problems. Decision tree is used for classification and regression. Random forest algorithm builds a number of classifiers on the training data and combines all their outputs to make the most accurate predictions. So to decide which Machine learning algorithm to be used, we compare their accuracies. The accuracy of KNN is 93.04%, the accuracy of the Decision tree model is 97.01%, and the accuracy of the random forest model is 98.04% The algorithm which projected the highest accuracy is random forest. Applying Algorithm: 80% of the data is allotted for testing and 20% of the data is allotted for training
Data
testing: 80%
training: 20%
Figure 3: Split sizes
Literature References In previous works, it’s been observed that since different types of datasets and machine learning models are used, the predictions often differ. Classification models have been implemented on various other applications like prediction of weather, banking, finances and also insecurity. The authors in [8] used an efficient authentic method called assemble-stacking based crime prediction method (SBCPM) is used based on SVM algorithms for identifying the appropriate predictions of crime by implementing learning-based methods, using MATLAB. This model is used for Crime prediction for crime dataset which is solely based on violence. The authors in [9] used MV algorithm and Apriori algorithm is applied with some enhancements to aid in the process of filling the missing value and identification of crime patterns with the help of the crime data taken from the city police department. The authors in [10] created a proposed framework for the crime and criminal data analysis and detection using Decision tree Algorithms for data classification and Simple K Means algorithm for data clustering. The classification is based mainly on the type, location, and the time in which the crime has taken place. The authors in [11] used clustering techniques in order to help the crime branch for better prediction and classification of crimes. Partition clustering methods primarily classified into K-means, AK-mode. The partitioning method constructs ‘k’ partitions of the data
460
IoT, Cloud and Data Science
from a given dataset of ‘n’ objects. This work also uses Expectation-Maximization algorithm, which is an extension of K-means algorithm which can be used to find the parameter estimates for each cluster. The entire data is a mixture of parametric probabilistic distribution. Summary Trying to solve a crime problem is a tough task to be done with just experience and intelligence, So new methods are created that can help us with crime detection problem by using the stored the crime data. The accuracies of three Machine learning algorithms were compared, and the ML algorithm which projected the highest accuracy is Random forest algorithm. For data visualization, we created another data set from the existing set by using SQL (Fig.4), with this data set, we created an SNS heatmap (Fig.5) to get an idea on how crimes vary from one day to another. With the help of this heatmap, we can make a conclusion that towards the end of month, act 323(violence) has occurred the most, whereas in other days, act 279(accident) has occurred the most.
Figure 4: Sub Dataset for Heatmap
Figure 5: SNS Heatmap References [1] Bogomolov, Andrey and Lepri, Bruno and Staiano, Jacopo and Oliver, Nuria and Pianesi, Fabio and Pentland, Alex.2014. Once upon a crime: Towards crime prediction from demographics and mobile data, Proceedings of the 16th International Conference on Multimodal Interaction. [2] Kianmehr, Keivan and Alhajj, Reda. 2008. Effectiveness of support vector machine for crime hotspots prediction, pages 433-458, Applied Artificial Intelligence, volume 22, number 5. [3] Yu, Chung-Hsien and Ward, Max W and Morabito, Melissa and Ding, Wei.2011. Crime forecasting using data mining techniques, pages 779-786, IEEE 11th International Conference on Data Mining Workshops (ICDMW) [4] Wang, Tong and Rudin, Cynthia and Wagner, Daniel and Sevieri, Rich. 2013. pages 515- 530, Machine Learning and Knowledge Discovery in Databases
Advances in Science and Technology Vol. 124
461
[5] Toole, Jameson L and Eagle, Nathan and Plotkin, Joshua B. 2011 (TIST), volume 2, number 4, pages 38, ACM Transactions on Intelligent Systems and Technology [6] Leo Breiman, Random Forests, Machine Learning, 2001, Volume 45, Number 1, Page 5 [7] Friedman, Jerome H. ”Stochastic gradient boosting.” Computational Statistics and Data Analysis 38.4 (2002): 367-378.sts [8] Sapna Singh Kshatri, Deepak Singh, Bhavana Narain , Surbhi Bhatia. ] An Empirical Analysis of Machine Learning Algorithms for Crime Prediction Using Stacked Generalization: An Ensemble Approach (Digital Object Identifier 10.1109/ACCESS.2021.3075140) [9] A. Malathi, Dr. S. Santhosh Baboo. Algorithmic Crime Prediction Model Based on the Analysis of Crime Clusters (Global Journal of Computer Science and Technology Volume 11 Issue 11 Version 1.0 July 2011) [10] Kadhim B. Swadi Al-Janabi. A Proposed Framework for Analyzing Crime Data Set Using Decision Tree and Simple K-Means Mining Algorithms (Journal of Kufa for Mathematics and Computer Vol.1, No.3, may, 2011, pp.8- 24) [11] Revatthy Krishnamurthy, J. Satheesh Kumar. Survey of data mining techniques on crime data analysis (Vol 01, Issue 02, December 2012 International Journal of Data Mining Techniques and Applications)
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 462-468 doi:10.4028/p-6a0n6u © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-27 Accepted: 2022-09-16 Online: 2023-02-27
Prediction of Eligibility for COVID-19 Vaccine Using SMLT Technique Prajwal Bisht1,a∗ , Vinayak Bora2,b , S. Poornima3,c , M. Pushpalatha4,d 1 Student,
Department of Computer Science, SRM Institute of Science and Technology, Kattankulathur 603203, Chennai, India
2 Student,
Department of Computer Science, SRM Institute of Science and Technology, Kattankulathur 603203, Chennai, India
3 Department
of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur 603203, Chennai, India
4 Department
of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur 603203, Chennai, India a [email protected], b [email protected], c [email protected], d [email protected]
Keywords: Covid-19, Vaccine, Random Forest, Logistic Regression, Prediction, Support Vector Machine, Decision Tree, F1-metric, Blood Pressure, Sugar, Eligibility, Machine Learning, Data Exploration and Model
Abstract. The worldwide society was devastated by the 2019 coronavirus illness (COVID19) epidemic in Wuhan, China, which overloaded advanced medical systems around the world. The World Health Organization (WHO) is constantly monitoring and responding to the pandemic. The current rapid and exponential development in patient numbers necessitates the use of AI technology to forecast possible outcomes of infected individuals in order to provide suitable therapy. The goal is to find the machine learning-based solution that best fits the Covid19 vaccination predictions with the highest accuracy. Variable identification, univariate analysis, bivariate and multivariate analysis, missing value handling and data validation analysis, data cleaning / preparation, and data validation analysis are all accomplished using supervised machine learning technology (SMLT). Various types of data, such as visualisation, are gathered. For the entire given dataset. Proposal of a machine learning-based method for accurately predicting the suitability of Covid19 vaccine prediction. Introduction The SARS-CoV-2 virus, which was found in Wuhan, China in 2019, was responsible for the COVID19 pandemic. The World Health Organization declared a worldwide pandemic scenario on March 11, 2020, due to the disease’s rapid spread over the world. Many people were affected by this illness, and many lost their lives, their livelihoods, their schooling, and so on. The first case was reported in Kerala, and since then, it has spread across India’s states and union territories. The second wave of COVID-19 has lately exploded across all states and union territories. The Indian government has set up vaccination camps for persons aged 45 and up. The vaccine strengthens our immune systems, and the second dose lasts 28 to 42 days Literature Review The CoronaTracker [1] was established after WHO declared COVID-19 a Public Health Emergency of International Concern (PHEIC) on January 30, 2020. Fairoza Amira Binti Hamzaha, Cher Han Laub, and Hafeez Nazric wrote ”CoronaTracker: World-wide COVID-19 Outbreak Data Analysis and Prediction.” CoronaTracker was an online news and data platform that provided the most up-todate and reliable information. It seeks to predict COVID19 cases, fatalities, and recoveries using SEIR (Susceptible-Exposed-Infectious-Recovered) predictive modelling. The model helps in analysing public attitude about the dissemination of related health information, as well as determining the political and economic effect of the virus’s spread.
Advances in Science and Technology Vol. 124
463
Another study by P. Arumugam, V. Kadhirveni, R. Lakshmi Priya, and Manimannan G [3] called ”Prediction, Cross Validation, and Classification in the Presence of COVID-19 of Indian States and Union Territories using Machine Learning Algorithms” used models like [4] SVM, kNN, Random Forest, and Logistic Regression to perform and identify five meaningful clusters, labelling affected areas as Very Low, Low, Moderate, High In addition, four machine learning algorithms were used to cross-validate the five clusters, and affected states were depicted using prediction and probabilities. Cross validation accuracy for the different machine learning models was 88%, 97%, 91%, and 91%, respectively. [2] Forecasting, medical diagnostics, drug development, and contact tracing are all covered in ”Machine learning applications for covid-19: a state-of-the-art review” by Firuz Kamalov, Aswani Cherukuri, Hana Sulieman, Fadi Thabtah, and Akbar Hossain. It looked at the most successful state-ofthe-art studies and examined them. Unlike other studies on the topic, this article provides a high-level overview of current research that is sufficiently detailed to provide an informed perspective. Methodology Logistic Regression It’s a statistical technique for analysing a data collection in which one or more independent variables influence the outcome. A dichotomous variable is used to assess the outcome (in which there are only two possible outcomes). The purpose of logistic regression is to identify the best-fitting model to represent the relationship between a set of independent (predictor or explanatory) factors and a dichotomous feature of interest (dependent variable = response or outcome variable). A Machine Learning classification approach called logistic regression is used to predict the likelihood of a categorical dependent variable. The dependent variable in logistic regression is a binary variable that comprises data coded as 1 (yes, success, etc.) or 0 (no) (no, failure, etc.). Random Forest Classifier Random forests, also known as random decision forests, are an ensemble learning method for classification, regression, and other tasks that works by training a large number of decision trees and then outputting the class that is the mode of the classes (classification) or the mean prediction (regression) of the individual trees. Random choice forests correct the tendency of decision trees to overfit their training set. Random forest is an ensemble learning-based supervised machine learning technique. Ensemble learning is a sort of learning in which numerous versions of the same algorithm are combined to produce a more effective prediction model. The random forest algorithm combines several methods of the same sort, such as numerous decision trees, to produce a forest of trees, hence the name ”Random Forest.” Both regression and classification jobs can benefit from the random forest approach. Decision Tree Classifier The decision-tree algorithm is classified as a supervised learning algorithm. It can be used with both continuous and categorical output variables. Decision tree assumptions: • At first, we consider the entire training set to be the root. • For information gain, attributes are supposed to be categorical, but attributes are considered to be continuous. • Records are distributed recursively based on attribute values. • As a root or internal node, we apply statistical approaches to rank attributes. In the shape of a tree structure, decision trees construct classification or regression models. It incrementally cuts down a data set into smaller and smaller sections while simultaneously developing a decision tree. A leaf node reflects a categorization or decision and a decision node has two or more branches. The root node of a tree is the topmost decision node that corresponds to the best predictor. Both category and numerical data can be handled with decision trees. In the shape of a tree structure, decision trees construct classification or regression models. It uses a mutually exclusive and exhaustive
464
IoT, Cloud and Data Science
collection of if-then rules to classify data. The rules are learned one at a time, one by one, from the training data. When a rule is broken, it is considered a violation of the rule. On the training set, this process is repeated until a termination condition is met. It’s built in a recursive divide-and-conquer style from the top down. All of the characteristics should be categorical in nature. Otherwise, they should be separated ahead of time. The information gain concept is used to identify attributes at the top of the tree that have a greater impact on classification. A decision tree can easily be over- fitted, resulting in an excessive number of branches, which can reveal anomalies due to noise or outliers. Support Vector Machines A classifier that categorises a data collection by determining the best hyperplane between the data points. This classifier was chosen because it is quite versatile in terms of the amount of various kernelling functions that may be used, and it can produce a high prediction rate. Support Vector Machines (SVMs) are one of the most well-known and discussed machine learning techniques. They were incredibly popular when they were first invented in the 1990s, and they remain the go- to method for a high-performing algorithm that requires minimal adjustment today. • How to decipher the various names given to support vector machines. • When the model is really stored on disc, SVM uses this representation. • How to produce predictions for new data using a trained SVM model representation. • How to build an SVM model using training data. • How to get your data ready for the SVM algorithm. • Where you may find additional information about SVM. Workflow
Fig. 1: :Basic Workflow
Advances in Science and Technology Vol. 124
465
Data wrangling The data will be loaded into this area of the report, and it will be checked for cleanliness before being trimmed and cleaned for analysis. Ensure that the document procedures are followed carefully and that cleaning decisions are justified. Data collection The data set for predicting provided data is divided into two parts: training and testing. In most cases, 7:3 ratios are used to divide the Training and Test sets. On the Training set, the Data Model, which was constructed using Random Forest, logistic, Decision tree algorithms, and Support vector classifier (SVC), is used, and Test set prediction is done based on the test result accuracy. Preprocessing There is a chance that the data obtained contains missing values, which could lead to inconsistencies. To get better results, data must be preprocessed to boost the algorithm’s efficiency. Outliers must be deleted, and variable conversion must be performed.
Results and Comparison Logistic Regression For Logistic Regression, confusion matrix with target value that is 0 and 1 predictions are as follows
Fig. 2: :Logistic Regression accuracy with Confusion Matrix 151 0’s are correctly predicted, 1 1’s are incorrectly predicted, 18 1’s are correctly predicted and 10 0’s are incorrectly predicted. So, the overall accuracy of this model comes out to be 93.8%. Random Forest For Random Forest, 152 0’s are correctly predicted and 18 1’s are correctly predicted with 0 False Positive and False negative values, the accuracy for Random forest model is 100%.
466
IoT, Cloud and Data Science
Fig. 3: :Random Forest accuracy with Confusion Matrix
Decision Tree For Decision Tree, 152 0’s are correctly predicted and 28 1’s are correctly predicted with 0 False Positive and False negative values, the accuracy for Decision Tree model is also 100%.
Fig. 4: :Decision Tree accuracy with Confusion Matrix Support Vector Machines For Support Vector Machines, confusion matrix with target value that is 0 and 1 predictions are as follows:
Advances in Science and Technology Vol. 124
467
152 0’s are correctly predicted, 0 1’s are incorrectly predicted, 0 1’s are correctly predicted and 28 0’s are incorrectly predicted. So, the overall accuracy of this model comes out to be 84.5% i.e., the lowest out of the four.
Fig. 5: :Support Vector Machines accuracy with Confusion Matrix
Conclusion Based on the four-classification algorithm, namely Logistic Regression, Random Forest Classifier, Decision Tree, Support Vector Machines, the COVID-19 data is classified. The model took in parameters like State, Age, Gender, Systolic BP, Diastolic BP, Sugar level before Eating, Sugar level after Eating and whether the patient is Alcoholic or not. After evaluating the model performances, selection of a model is done based on the accuracy and model giving the most accuracy is chosen to be ported to the python framework called Flask. In this case, Random Forest and Decision Tree are chosen as model of choice due to their exceedingly high performance. Through Flask an Online web Application is created for the user to interact with, which allows the user to take in inputs like State, Age, Gender, Systolic BP, Diastolic BP, Sugar level before Eating, Sugar level after Eating and whether the patient is Alcoholic or not. Through these user inputs, the web application will categorize the data into 2: • Eligible: You are eligible for a Covid-19 vaccine. • Not Eligible: You are not eligible for Covid-19 vaccine. With a growing demand of DNA vaccine in this time period, this research can also make an impact in other similar projects with the use of similar technology in medical field and also innovate others to contribute their idea for a greater cause.
468
IoT, Cloud and Data Science
References [1] Hamzah, F. B., Lau, C., Nazri, H., Ligot, D. V., Lee, G., Tan, C. L., Shaib, M., Zaidon, U. H. B., Abdullah, A. B., Chung, M. H., et al., 2020, “CoronaTracker: worldwide COVID-19 outbreak data analysis and prediction,” Bull World Health Organ, 1(32), pp. 1–32. [2] Kamalov, F., Cherukuri, A., Sulieman, H., Thabtah, F., and Hossain, A., 2021, “Machine learning applications for COVID-19: A state-of-the-art review,” arXiv preprint arXiv:2101.07824. [3] Arumugam, P., V., K., R., L., and Ganesan, M., 2021, “Prediction, Cross Validation and Classification in the Presence COVID-19 of Indian States and Union Territories using Machine Learning Algorithms,” International Journal of Recent Technology and Engineering, 10, pp. 16–20. [4] Bhadana, V., Jalal, A. S., and Pathak, P., 2020, “A comparative study of machine learning models for COVID-19 prediction in India,” 2020 IEEE 4th Conference on Information & Communication Technology (CICT), IEEE, pp. 1–7. [5] Ardabili, S. F., Mosavi, A., Ghamisi, P., Ferdinand, F., Varkonyi-Koczy, A. R., Reuter, U., Rabczuk, T., and Atkinson, P. M., 2020, “Covid-19 outbreak prediction with machine learning,” Algorithms, 13(10), p. 249. [6] Mendoza-Guevara, C. C., Ramón-Gallegos, E., Martínez-Escobar, A., Alonso- Morales, R., del Pilar Ramos-Godínez, M., and Ortega, J., 2021, “Attachment and in vitro transfection efficiency of an anti-rabies Chitosan-DNA nanoparticle vaccine,” IEEE Transactions on NanoBioscience, 21(1), pp. 105–116.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 469-477 doi:10.4028/p-sw6mmb © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-27 Accepted: 2022-09-16 Online: 2023-02-27
A Novel Analysis of Employee Attrition Rate by Maneuvering Machine Learning Lalitha V.1,a, Prithiv Raj S.2,b*, Lokesh P.3,c Sri Sairam Engineering College, Chennai, India
1-3
[email protected], [email protected], [email protected]
a
Keywords: Attrition rate, IT firm, Employee attrition, Supervised learning, Classification, Machine learning, HR department, Employee dataset, LGBM, Decision tree.
Abstract. Employee attrition rate in Tech industry has become dreadful day by day in all over the world. Meanwhile It has been noticed that churn (attrition) rate in IT industries is growing rapidly than expected especially during pandemic times. This is taken as a foremost issue by each tech industry, to analyze and adapt to the change. The main snag is that, the expenditure of recruiting on a new employee is foremost ineffective than retaining a company trained professional employee. Also retaining an employee will assure certain credibility and work culture of the company than the new employee. Also, the latter will be given access to training modules and code of conduct of the company with lots of Information Overload on a short span of time. It is essential to mention, not every organization has comprehensive training programs for their employees, especially the start-up tech firms, which focuses heavily on skilled workers with experience beforehand. This anonymity causes HR departments to scrutinize and tweak their actions according to current trend in the market. The major goal of this study is to make predictions whether the skillful employee will quit or continue further and predict the reason for quit using supervised classification and machine learning algorithms. Acquainting the human resource team to help them with the required analytics to make decisions based on machine learning. Introduction The premise behind this paper, as an economic prevalent country, many companies find it difficult to operate during pandemic times along with their employee’s attrition rate in upward trend last 5-6 years. Some employees in small scale companies try to resign or outgrow a job for several reasons which includes lack of advancement, creating their own challenges or to make their career advancement in new technology which is prevailing in the market. This study examines and decodes every market development from the perspective of both the business and the employees. Retirement is one of the most prevalent reasons for employee turnover. If a considerable majority of an organization's employees is in the same age group, attrition due to retirement can become a major worry. The company's recruiting and termination criteria are specified as the attrition rate. To determine employee attrition rate, divide the number of workers who have left the workforce by the average number of employees. Meanwhile there can be some surprising reasons to leave a firm by an employee because of their manager, not the company itself. According to the 2019 DDI frontline Leader Project, roughly 60% of 1,000 managers indicated their staff quit because of their boss. This comes in with a reason whether an employee and the managers are in constant talking.
470
IoT, Cloud and Data Science
Table 1: Attrition rate table Q3 FY21 7.60%
Attrition Rate Q4 FY21 7.20%
Q2 FY22 8.60%
Infosys
10%
15.20%
13.90%
Wipro
11%
12%
15.50%
HCL tech
10.20%
9.90%
11.80%
Tech Mahindra
12%
13%
17%
Company TCS
The above Table depicts the attrition rate of different IT firms like TCS, Infosys, Wipro, HCL tech, Tech Mahindra. According to an Aon research, attrition rates reached 21% in 2021, up from 13 per cent in 2020. In the first three quarters of 2021, Indian IT businesses employed over 3,50,000 individuals. Also, from the table we can able to see that the attrition rate gets increasing or remains constant. These are some of the following reasons for which an employee leaves a company.
Lack of Growth and Progression. Expectation of better salaries and respectable job roles. Toxic work culture and environment. Higher studies and relocation. Poor Training and testing. Lack of growth in industry and advancement opportunities. Being appreciated less. Being Overworked. Lack of Feedback and Recognition.
The following is the structure of this novel analysis paper: Section 2 discusses the current state and severity of the employee withholding problem, which can occur for a variety of reasons. Section 3 addresses the data utilized and how it might assist management in learning about and addressing staff attrition concerns based on the data, as well as the findings produced from various machine learning algorithms. Section 4 concludes and outlines future work. Literature Survey "Machine Learning Model to Predict Workforce Attrition" by Priyanka Sadana. [1] This method eliminates the need for manual report creation and analysis, and it is never out of current because new data for retraining is provided every year. In the following sections, we'll go through each step that was taken to develop this solution. The fundamental pipeline for estimating workforce attrition in IT firms, with reference to employee abilities, is explained in this study. "From Big Data to Deep Data to Support People Analytics for Employee Attrition Prediction," by N. B. Yahia, J. Hlel, and R. Colomo-Palacios [2]. A converter circuit is provided for practical implementation. From full-load to no-load, the proposed resonant converter reduces conduction and switching losses while maintaining a relatively constant efficiency. In terms of research constraints, it will be fruitful to investigate the influence of dynamic elements that deal with employee behavior and emotional states on employee attrition.
Advances in Science and Technology Vol. 124
471
N. B. Yahia, J. Hlel, and R. Colomo-Palacios [2] wrote "From Big Data to Deep Data to Support People Analytics for Employee Attrition Prediction." For practical application, a converter circuit is given. The suggested resonant converter minimizes conduction and switching losses while keeping a reasonably consistent efficiency from full-load to no-load. In terms of research limitations, it would be beneficial to look at the impact of dynamic aspects such as employee behavior and emotional states on employee turnover. "Towards Understanding Employee Attrition Using Decision Tree Approach," by Saadat M Alhashmi [4]. Using publicly accessible data and a decision tree technique, this study addressed the issue of staff attrition. The results of this work-in-progress study are encouraging, and future research will add more factors and test the model using data from a local grocery. Dilip Singh Sisodia, SomduttaVishwakarma, AbinashPujahari worked on “Evaluation of Machine Learning Models for Employee Churn Prediction” [5]. The histogram is created in the experimental section, which displays the contrast between left workers vs. wage, department, satisfaction level, and so on. We employ five distinct machine learning algorithms for prediction, including the linear support vector machine, C 5.0 Decision Tree classifier, Random Forest, k-nearest neighbor, and Nave Bayes classifier. This research investigates which machine learning algorithm is most effective at predicting which employees are most likely to leave a company. Their job conditions and surroundings "Towards Understanding Employee Attrition Using Decision Tree Approach," by Saadat M Alhashmi [4]. Using publicly accessible data and a decision tree technique, this study addressed the issue of staff attrition.The results of this work-in-progress study are encouraging, and future research will add more factors and test the model using data from a local grocery. To detect staff attrition, Huang, M. T. Kechadi, and B. Buckley worked on "customer attrition prediction in telecommunications"[6]. During this period, customer attrition detection became an important component of the telecom business. This study is useful for predicting customer attrition behavior and preventing customer attrition. For this work, they derive rules from attrition detection using roughest theory, a rule-based decision- making method. To do this, they employed four rule algorithms: exhaustive, genetic, covering, and LEM2. Genetic algorithms are used to classify rough sets. A approach for improving performance that is based on rules. "Churn prediction in subscription services: An application of support vector machines while comparing two parameter- selection techniques"[7] was worked on by K. Coussement and D. Van den Poel. Academics and practitioners agree that designing an attrition identification model that is as exact as possible is critical when it comes to keeping consumers. Two parameter-selection strategies are compared in order to conduct support vector machines. Grid search and cross-validation are used by both strategies. "Handling class imbalance in customer attrition prediction," by J. Burez & D. Van den Poel [8]. They looked into how to deal with class imbalance in attrition identification in this study. They used the most reliable assessment measures for this investigation. Lift, that's AUC. AUC and lift have shown to be useful assessment measures for determining accuracy. Results that have the potential to improve forecast accuracy. "Variable selection by association rules for customer attrition prediction of multimedia on demand," by C.-F. Tsai and M.-Y. Chen [9]. In the field of mobile telecommunications, data mining approaches such as neural networks and decision trees have been widely used to develop customer attrition identification models. The pre- processing part of this study comprises the use of association rules to select significant variables. Prediction accuracy, precision, recall, and F-measure are four assessment metrics that haven't been taken into consideration while evaluating the model's performance.
472
IoT, Cloud and Data Science
Solution Approach Employee retention has been a major challenge for many IT firms in recent years. Many companies are experimenting with different retention techniques, such as publishing. Yearly employee survey results and recognizing employee grievances. In this paper, we use a Machine Learning approach to develop a model using the data we collected. Machine learning is used because it simplifies and speeds data-driven choices by achieving high accuracy when trained on data. This solution removes the need for manual report preparation and data analysis, and it is always up to date because fresh retraining data is made available every year. The solution strategy is depicted in the flowchart below.
Fig. 1. Flowchart of a comprehensive solution strategy A. Information Gathering The collecting of relevant data from all accessible sources in order to do analysis is referred to as data collection. There are 14999 records in this data collection, with 12 attributes. B. Data Cleaning Cleaning data is a vital step in every machine learning approach. Data cleaning is the process of finding and fixing If we start with a clean dataset, there's a strong chance we'll be able to produce good results using basic techniques, which can be quite useful in terms of computing, especially when the dataset is enormous flaws in a dataset that may have a detrimental influence on a prediction model.
Remove any observations that are redundant and unnecessary Check if the data collection contains any null values. Handle missing data.
C. Exploratory Data Analysis Exploratory data analysis is a critical phase in any data analysis. EDA is the process of studying a dataset to uncover patterns and hypotheses based on our understanding of the dataset.
Advances in Science and Technology Vol. 124
473
Fig. 2. Left vs Count According to the data obtained, 11,428 employees remained with the firm while 3571 left, as seen in the graph above.
Fig. 3. Left vs Satisfaction level The graph above displays the degree of satisfaction of employees who stayed with the company and those who departed. When compared to the person who stayed, the employee who departed has a lower degree of satisfaction.
Fig. 4. Bar plot
Fig. 5 Attrition vs Satisfaction level
Fig. 6. Applying correlation plot
The graph above relates job satisfaction to promotions over the last five years. Let's do a check at how features are linked. The heat map generated by .corr() given under shows that several columns appear to be weakly connected to one another. It is preferable to train a model using characteristics that are not too linked in order to avoid keeping duplicate variables.
474
IoT, Cloud and Data Science
Fig. 7. Pearson Numerical features are correlated D. Categorical Encoding and Extraction Of features The below figure describes the use of Label encoding technique.
Fig. 8. Label Encoding Fig. 9. Encoding variable The dataset which we chose contains both categorical variables and numerical data. Because many algorithms struggle with categorical variables, the Label Encoder technique is used to numerically encode these variables. E. Predicting Attrition using Machine Learning Models
Fig. 10. Train and Test split data The trained model is used to fit the machine learning technique, whereas the test model has been used to validate the model. Decision tree A decision tree is a decision-making system that helps individuals make better judgments by using a tree-like structure of alternatives and their probable implications, such as chance event results, cost objects, and utility. It's one technique to show an algorithm composed completely of decision statements. Decision trees are a prominent machine learning approach that's frequently used in industrial research, particularly in decision processes, to help figure out the best way to accomplish a goal. Decision analysis, a decision tree, and the strongly linked effect diagram as a graphical and statistical decision support tool are used to determine the expected values (or expected usefulness) of
Advances in Science and Technology Vol. 124
475
competing options. The following is the decision tree classifier algorithm, which is imported from the module scikit-learn. The decision tree nodes are 1. Decision nodes are often represented as squares. 2. Nodes of chance, which are typically depicted as circles.
Fig. 11. Decision Tree Classifier
Fig. 12. Decision Tree Accuracy Score
The accuracy of decision tree is found to be 97.3%.
Fig. 13. Decision Tree Support vector machine (SVM) The supervised machine learning method svm (support vector machine) is utilized in biological therapies, regression modelling, and classification algorithms. The fundamental goal of a support vector machine is to employ a hyperplane in N-dimensional space to categories data points clearly. The svm kernel function turns a low-dimensional input space into a high- dimensional input space that may be utilized to solve non-linear separation issues. Based on the needed output, svm substantially separates and changes the input.
Fig. 14. Applying SVM
Fig. 15. SVM accuracy Score
476
IoT, Cloud and Data Science
Fig. 16. SVM metrics LGBM (Light Gradient Boosting Machine) Light gradient boosting machine is a decision tree-based framework that enhances model efficiency while consuming less memory. It combines two novel techniques: Gradient- based One Side Sampling and Exclusive Feature Bundling (EFB), which overcome the shortcomings of the histogram- based method used in most GBDT (Gradient Boosting Frameworks based on Decision Trees. The GOSS and EFB approaches, which are detailed below, are used to produce the characteristics of the Light GBM Algorithm. They work together to ensure that the model runs smoothly and has a long- term impact on GBDT frameworks. Light GBM One-Side Sampling Technique Based on Gradients Distinct data instances perform various positions in the computation of information gain. For scenarios with larger gradients, the information gain will be higher. GOSS keeps examples with substantial gradients (higher than a predetermined threshold or among the top percentiles) and removes instances with modest gradients at random to maintain the data accuracy gain estimation. When the amount of information gained has a wide range, this approach can provide a more precise gain estimate than uniform random selection at the same desired sample rate. Step 1: Installing light GBM. Step 2: pip install light GBM. Step 3: Importing required library. Step 4: Reading the train and test dataset. Step 5: Removing columns not required. Step 6: Skipping data exploration. Step 7: Splitting dataset in two parts. Step8: On both data sets, separate the independent and target variables. Making a model object and fitting it to a training data set. Step 9: Predicting the target variable.
Fig. 17. Applying LightGBM If the conditions are properly calibrated, gradient boosting has the potential to surpass random forests. The accuracy of LightGBM has found to be 99.1333 Conclusion Labor attrition is a significant issue since a skilled and knowledgeable employee is difficult to replace and is cost- effective. We studied the employee dataset to find the variables that are most likely to cause the employee to quit the firm. Our data suggests that in this aspect, work overload, promotion, employee happiness, and monthly pay are the most effective features that impact employee decision-
Advances in Science and Technology Vol. 124
477
making. Extensive studies have been carried out to assess the efficacy of our strategy in terms of accuracy, precision, recall, and f1-score. The suggested work can help the human resources department predict whether or not an employee will quit the firm. The machine learning algorithm forecasts if there is a danger of employee attrition based on the employee signals. Some recommendations for effective retention are as follows. 1. Communication and feedback 2. Training and development 3. Recognition and rewards systems The outcome of this attrition prediction will assist the organization in lowering its attrition rate. References [1] P. Sadana and D. Munnuru, "Machine Learning Model to Predict Workforce Attrition," 2021 6th International Conference for Convergence in Technology (I2CT), 2021, pp. 1-6, doi: 10.1109/I2CT51068.2021.9418140. [2] N.B. Yahia, J. Hlel and R. Colomo-Palacios, "From Big Data to Deep Data to Support People Analytics for Employee Attrition Prediction," in IEEE Access, vol. 9, pp.60447-60458,2021,doi: 10.1109/ACCESS.2021.3074559. [3] A. Mhatre, A. Mahalingam, M. Narayanan, A. Nair and S. Jaju, "Predicting Employee Attrition along with Identifying High Risk Employees using Big Data and Machine Learning," 2020 2nd International Conference on Advances in Computing, Communication Control Networking (ICACCCN), 2020, pp. 269-276, doi: 10.1109/ICACCCN51052.2020.9362933. [4] S. M. Alhashmi, "Towards Understanding Employee Attrition using a Decision Tree Approach," 2019 International Conference on Digitization (ICD), 2019, pp. 44-47, doi: 10.1109/ ICD47981.2019.9105767. [5] D. S. Sisodia, S. Vishwakarma and A. Pujahari, "Evaluation of machine learning models for Employee churn prediction," 2017 International Conference on Inventive Computing and Informatics (ICICI), 2017, pp. 1016-1020, doi: 10.1109/ICICI.2017.8365293. [6] B. Huang, M. T. Kechadi, and B. Buckley, “Customer attrition prediction in telecommunications,” Expert Systems with Applications, vol. 39, no. 1, pp. 1414– 1425, 2012 [7] K. Coussement and D. Van den Poel, “Attrition prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques,” Expert systems with applications, vol. 34, no. 1, pp. 313–327, 2008. [8] J. Burez and D. Van den Poel, “Handling class imbalance in customer attrition prediction,” Expert Systems with Applications, vol. 36, no. 3, pp. 4626– 4636, 2009. [9] C.-F. Tsai and M. Y. Chen, “Variable selection by association rules for customer attrition prediction of multimedia on demand,” Expert Systems with Applications, vol. 37, no. 3, pp. 2006– 2015, 2010.
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 478-485 doi:10.4028/p-1h18ig © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-27 Accepted: 2022-09-16 Online: 2023-02-27
Building a Recommender System Using Collaborative Filtering Algorithms and Analyzing its Performance Akash Jeejoe1,a,Harishiv V.2,b, Pranay Venkatesh3,c, Dr. S.K.B Sangeetha4,d* Department of Computer Science Engineering, SRM Institute of Science and Technology Chennai, India [email protected],[email protected],[email protected],[email protected]
a
Keywords: Alternating Least Squares, Matrix factorization, Recommender system, Singular Value Decomposition, Video Recommendation.
Abstract. Recommender Systems (RS) systems help users to select items and recommend useful items to target customers who are interested in them, such as movies, music, books, and jokes. Traditional recommendation algorithms are primarily concerned with improving performance accuracy; as a result, these algorithms prefer to promote only popular products. Variability is also an important inaccurate number of personalized recommendations that suggest unfamiliar or different things. Multi objective development strategies, which magnify these contradictory measures simultaneously, are used to measure accuracy and variability. Existing algorithms have an important feature because they are not flexible enough to control competing targets. We suggest creating a recommendation system based on shared filtering. Instead of finding out the preferences and preferences of users openly, we can find out by publicly analyzing historical and real-time data. This is done through a process called matrix factorization. Matrix factorization algorithms work by decomposing the interactive matrix of the user object into a product of two rectangular matrices with a minimum size. This type of recommendation has the added advantage of finding invisible and unmeasured relationships that are not possible with standard content-based filters. A.
INTRODUCTION
The Recommender System (RS) helps users to recommend purchased products and provides information about their favorite products. In recent years, technological advancement has made the mass digital data available online. As a result of the overflow of information, users are unable to obtain reliable information about their preferences [4][8][18]. Recommendation systems may be able to solve the problem of overload. This helps users to filter out non-essential information and suggest related content. RS is based on an information filtering engine that helps individuals to make decisions in complex settings when capturing large databases of attractive products [13][26]. Clients and sellers benefit from these strategies as they make it easier for customers to find what they need while allowing retailers to sell more items. Collects a variety of client preferences, which may be explicitly stated, such as product ratings, or considerations in user behavior, Such as browsing history [17]. An easy-to-use and cost-effective Integrated Filtering System, generates accurate predictions, and provides answers to all of the above challenges in real-world filtering. However, some data features may be corrupted, making it difficult to retrieve important information [4] [25]. Filtering useless data requires the use of complex data structures. Many features in real data may not always be possible to mark rich objects with data with a simple binary label. As a result, fragmentation of multiple labels is an important component of data classification problems. In the two divisions, the data are grouped into two categories. There are more than two categories in dividing multiple categories, and each data line has a label value of only one label. There are more than two stages in classifying multiple labels, and each set of data may contain multiple labels [15] [16].These systems require the use of advanced filtering algorithms and are up-to-date in order to maximize performance and maintain competitive profitability[27]. Based on real-time user experience, we can conclude that although modern recommendation systems for high-end movie streaming sites use confidential algorithms, there is still room for improvement
Advances in Science and Technology Vol. 124
479
in these models[20][21]. Therefore, we would like to build a recommendation system using collaborative filtering algorithms such as SVD (Singular Value Decomposition) and ALS (Alternating Least Squares). These algorithms also fall under the subset of matrix factorization algorithms that work by breaking down the interactive matrix of the user object into a product of two rectangular matrices with a minimum size. This has the added advantage of finding invisible and unparalleled relationships that would not be possible with standard content-based filters[19]. Section 2 describes the background research. The proposed algorithm, along with the categories involved, is discussed in Section 3. Section 4 covers the parameters used in the proposed method and evaluates the proposed strategy in comparison with existing methods. Section 5 contains a summary of the research. B.
BACKGROUND STUDY
This section includes a multi-purpose recommendation system. Evolution algorithms are classified as single-purpose development algorithms or multiple-purpose development algorithms based on the number of goal functions. MOEA / D, NSGA-II, and SPEA2 are three well-known evolutionary algorithms with many well-known objectives. Multiple goal development is often used to solve problems that involve two or more conflicting tasks of purpose. However, we note that the indications of accuracy and inaccuracy in the recommendation system are inconsistent. As a result, some researchers are beginning to investigate evolutionary algorithms that have multiple purposes for recommendation systems [1] [2].The problem with developing multiple goals is one where the goal is to find a complete solution for a set of competing tasks. Normal MOP has n variable resolutions, objective k functions, and limit m. Policy and restrictive activities are determined by flexible selection. It is not uncommon for all objectives to be added simultaneously to multi-objective problems; rather, intentions often clash. Various studies have provided evolutionary algorithm solutions to the problem of multitasking because there is not always a single answer and the problem is known as NP-complete [3]. Multi objective optimization problems (MOPs) are applied to different learning areas, such as engineering design and route planning, to try to develop a set of conflicting objectives simultaneously. If you solve the MOP, you get a number of unauthorized solutions, each of which involves trading between several targets. According to the study, there is agreement between the various competing objectives when modeling MOPs for RSs (i.e., accuracy, diversity, and innovation). Many evolutionary approaches have recently been introduced [6] to address MOP RS models. For example, a customized recommendation was first identified as a dual purpose MOP (accuracy and variability), and then the MOEA was implemented using the ProbS method. Using the MOEA / D and NSGA-II frameworks, two MOEA novels to recommend long-tailed material were developed as a result of this study. In the performance of these tasks, the accuracy and variety of recommendations have been measured. These MOEAs, on the other hand, are bound to test two conflicting objectives (accuracy and variability), and their designs do not account for algorithm-specific information [7] [9]. Custom RS is very accurate when it comes to predicting what items a user might like. RS, on the other hand, is based on accuracy, recommending only popular products and not dislikes, resulting in the same suggestions for all users. In inaccuracies such as diversity and youth, they cannot bring about positive results. As a result, an accuracy-based system does not work to recommend content that is closely related to the novel to consumers, making it less useful. As a result, when recommending unpopular goods, both variability and accuracy are considered [10]. Considering the high accuracy of the RS, the various recommendations will be reduced. Similarly, strictly protecting the variety of items from users may reduce the accuracy of the recommendation. Because RS has two incompatible operating conditions: accuracy and variability. As a result, one of the most challenging challenges for RS is designed to maximize both accuracy and versatility at the same time. A few types of recommendation strategies have been developed to build a reliable and diverse list of suggestions [12] [22]. Traditional CF algorithms offer only one solution at a time, while the multi-purpose RS approach brings multiple solutions at once, each representing a different set of proposed ideas. This allows
480
IoT, Cloud and Data Science
decision makers or user-led users to select the result of the proposal that is most appropriate for the selection of alternatives. Only a few studies on the multi-objective RS have been conducted, according to previous research on related topics. This has prompted us to use multi-objective optimization to resolve conflicting multi-objective problems in RS for the accuracy, authenticity, and versatility [11] [14][28]. C.
SYSTEM METHODOLOGY
Collaborative filtering (CF) uses information filtering to analyze existing data and aid the target user in locating products. Memory-based CF algorithms estimate the target user's preferences by analyzing the user's whole item rating history, and model-based CF algorithms determine the resemblance between the target user and other users. For forecasts, these systems use similarity approaches and find the nearest neighbors [5] [23]. Memory-based approaches come in two flavors: user and itembased. The information of adjacent users with similar interests is used to construct user-based CF algorithms. The user-based CF algorithm starts by calculating how similar the target user is to all other users. Finally, the item ratings are predicted using the average ratings of the neighbors [24].
Fig. 1. Architecture Diagram The item-based CF method is identical to the user-based CF algorithm, except that the similarity value is calculated between things in the former and between persons in the latter. A recommender system is built using collaborative filtering algorithms such as SVD (Singular Value Decomposition) and ALS (Alternating Least Squares). These algorithms also come under the subset of matrix factorization algorithms which work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices. The suggested recommender system technique is depicted in Figure 1 as an architecture diagram.The ALS and SVD methods of Collaborative filtering
Advances in Science and Technology Vol. 124
481
are used with the MovieLens dataset to generate the recommendations.The movie recommendations are displayed in a web application. A.
Alternating Least Squares
The basic concept of Alternating Least Squares is taking a large matrix of user-item interaction and figuring out which latent features are related between the user and items and reducing it into a much smaller matrix. Alternate Least Square takes a matrix, let’s say ‘R’, and splits it into two smaller matrices ‘U’ and ‘P’ which when multiplied with each other produce an approximation of the matrix ‘R’. The smaller matrices ‘U’ and ‘P’ are filled with random values and the error term is calculated according to the formula below: Error= ∑ wij . (Rij-ui x pj T ) + λ( ∥ U ∥2 + ∥ P ∥2)
After the error term is calculated, the ALS algorithm alternatively fixes the values of matrices ‘U’ and ‘P’ and the error is calculated until it is minimized. In this project we use k-fold as opposed to random split, in K-fold the entire dataset is considered for training and validation purposes, the basic steps involved in the k fold process are: 1) Divide the dataset into k equal subsets like A1,A2,A3……Ak where each subset is called a fold 2) For i=1 to i=k, consider A as the validation set and all remaining K-1 folds are in the cross validation training set 3) Using the cross validation we train the ML model and generate the accuracy 4) We evaluate the accuracy using all k cases of cross validation. B.
Singular Value Decomposition
Singular Value Decomposition is very similar to the aforementioned ALS approach but with some key differences. 1) It models the user and item biases (baselines) from the users and items. 2) It uses stochastic gradient descent for optimization unlike alternating least squares in the ALS model. Latent factors are implicit characteristics of users and items. The final output of matrix A reduces the dimensions by extracting the latent features. C.
Item Rating Prediction
This section covers the basics of object mathematical similarity, prediction, and the top-N recommendation system. As mentioned earlier, traditional CF algorithms are divided into two categories: model-based methods and memory. Each strategy has its own specific characteristics. The sparsity problem can be solved using model-based strategies. However, model development is expensive. Memory-based algorithms are less prone to measurement problems and are more likely to improve them than model-based algorithms. In RS, CF is based on what is a popular strategy. The nearest neighbor recommendation algorithm is another name. The limitations of this study are predicted using an object-based CF algorithm. In the Item-based Estimated Estimates Step, the CFbased object algorithm will predict an unknown object measure for each installed user based on the user's matrix measurement history. The steps for an object-based CF are Step 1: Item similarity calculation Similarity computation, a crucial stage in item-based CF. Cosine and correlation-based similarity algorithms are the most commonly used. Cosine similarity of items i and j and is calculated as follows: 𝑣𝑣 . 𝑣𝑣
𝑆𝑆𝑖𝑖,𝑗𝑗 = |𝑣𝑣 𝑖𝑖|.|𝑣𝑣𝑗𝑗 | 𝑖𝑖
𝑗𝑗
Where 𝑆𝑆𝑖𝑖,𝑗𝑗 is the similarity and 𝑣𝑣𝑖𝑖 and 𝑣𝑣𝑗𝑗 are the rating vectors.
(1)
482
IoT, Cloud and Data Science
Step 2: Prediction The rating predicted for an item i is calculated as follows: 𝑃𝑃𝑢𝑢,𝑖𝑖 =
∑ 𝑗𝑗∈𝑁𝑁
∑ 𝑗𝑗∈𝑁𝑁
�𝑆𝑆𝑖𝑖,𝑗𝑗 𝑉𝑉𝑢𝑢,𝑖𝑖 �
(2)
| 𝑆𝑆𝑖𝑖,𝑗𝑗 |
where N is the number of neighbors, 𝑃𝑃 is the prediction, 𝑆𝑆 indicates the rating. Step 3: Rating prediction of top-N recommender system
denotes the similarity extent, and 𝑉𝑉
It forecasts how the target user would rank unrated products. The target user is supplied with the topN recommendations based on their prediction ratings. The user should find the recommended results to be the most appealing. Each item in the top-N recommendations does not require a rating value. D.
RESULTS AND DISCUSSION
A.
Dataset TABLE I. Movie Lens Dataset PARAMETERS Dataset Users in Numbers Items in Numbers Ratings Rating Scale
VALUE Movie Lens 943 1682 100,00 1-5
Table 1 displays the data set. We divide the data into two halves, the training set and the test set, at random in our experiments. The training set contains 75% of the data, whereas the test set contains the remaining 25%. Existing methodologies are used to assess the performance of the proposed system. After this K Fold cross validation is also performed in order to obtain get a better performance assessment B. Comparison Analysis Based on the evaluation of each system using RMSE and MAE metrics, we generated heatmaps and concluded what the lowest scores are depending on the parameter values.
Fig. 2. ALS RMSE Heatmap
Advances in Science and Technology Vol. 124
483
Fig. 3. ALS MAE Heatmap
Fig. 4. SVD RMSE Heatmap
Fig. 5. SVD MAE Heatmap CONCLUSION To conclude, Metrics such as RMSE and MAE are great for evaluating the prediction of the rating of a user on a new movie. But a recommender system is more than just a rating prediction system. While the ALS technique gets better scores than SVD in these metrics, SVD was able to implicitly relate similar genres between movies which isn’t reflected in the mathematical accuracy. For the target user, this algorithm can produce a variety of recommendations. The decision maker can select a recommendation that meets their needs. The suggested method was tested on the Movie Lens dataset, and the results showed that it is capable of recommending a wide range of things. In the future, we can improve our framework by increasing the diversity of the algorithm and integrating numerous objectives in recommendations.
484
IoT, Cloud and Data Science
REFERENCES [1] K. Aravindhan,S.K.B.Sangeetha,K.Periyakaruppan, Sivani R and Ajithkumar S.(2021).Smart Charging Navigation for VANET based Electric Vehicles" 7th International Conference on Advanced Computing and Communication Systems (ICACCS), 2021, pp. 1588 1591, https://doi.org/10.1109/ICACCS51430.2021.9441842 [2] K.Aravindhan,S.K.B.Sangeetha,K.Periyakaruppan,K.P.Keerthana, V.Sanjay Giridhar and V.Shyamala Devi,(2021).Design of Attendance Monitoring System using RFID " 7th International Conference on Advanced Computing and Communication Systems (ICACCS), 2021, pp. 1628-1631, https://doi.org/10.1109/ICACCS51430.2021.9441704 [3] R. Chen, Q. Hua, Y.-S. Chang, B. Wang, L. Zhang, and X. Kong, “A survey of collaborative filtering-based recommender systems: from traditional methods to hybrid methods based on social networks,” IEEE Access, vol. 6, pp. 64301–64320, 2018. https://doi.org/10.1109/ACCESS.2018.2877208 [4] L. Cui, P. Ou, X. Fu, Z. Wen, and N. Lu, “A novel multi-objective evolutionary algorithm for recommendation systems,” Journal of Parallel and Distributed Computing, vol. 103, pp. 53–63, 2017. https://doi.org/10.1016/j.jpdc.2016.10.014 [5] Dr.R.Dhaya, S.K.B.Sangeetha, Ashish Sharma, and Jagadeesh.(2017).Improved Performance of Two Server Architecture in Multiple Client Environment, IEEE International Conference on Advanced Computing and Communication Systems, ISBN XPlore No: 978-1-5090-4559-4, Shri Eshwar College of Engineering, Coimbatore, 6th and 7th January. https://doi.org/10.1109/ICACCS.2017.8014560. [6] Dhaya Kanthavel, S.K.B.Sangeetha and K.P. Keerthana, (2021).An empirical study of vehicle to infrastructure communications - An intense learning of smart infrastructure for safety and mobility.International Journal of Intelligent Networks,Volume 2,Pages 77-82,ISSN 2666-6030, https://doi.org/10.1016/j.ijin.2021.06.003 [7] T. Horváth and A. de Carvalho, “Evolutionary computing in recommender systems: a review of recent research,” Natural Computing, vol. 16, pp. 441–462, 2017. https://doi.org/10.1007/s11047016-9540-y [8] J. Jooa, S. Bangb, and G. Parka, “Implementation of a recommendation system using association rules and collaborative filtering,” Procedia Computer Science, vol. 91, pp. 944–952, 2016. https://doi.org/ 10.1016/j.procs.2016.07.115 [9] R. Kanthavel,S.K.B.Sangeetha,and K.P. Keerthana, (2021).Design of smart public transport assist system for metropolitan city Chennai",ScienceDirect International Journal of Intelligent Networks,Volume 2,2021.Pages 57-63,ISSN 2666-6030, https://doi.org/10.1016/j.ijin.2021.06.004 [10] X. Liu, Z. Zhan, Y. Gao, J. Zhang, S. Kwong, and J. Zhang, “Coevolutionary Particle Swarm Optimization with Bottleneck Objective Learning Strategy for Many-Objective Optimization,” IEEE Transactions on Evolutionary Computation, 2018. https://doi.org/10.1109/TEVC.2018.2875430 [11] Marco Tulio Ribeiro, Anisio Lacerda, Adriano Veloso, and Nivio Ziviani, “Pareto-efficient hybridization for multi-objective recommender systems”, In Proceedings of the sixth ACM conference on Recommender systems (RecSys '12). Association for Computing Machinery, New York, NY, USA, 19–26.2012. [12] Marco Tulio Ribeiro, Nivio Ziviani, Edleno Silva De Moura, Itamar Hata, Anisio Lacerda, and Adriano Veloso, “Multiobjective Pareto-Efficient Approaches for Recommender Systems”,ACM Trans. Intell. Syst. Technol. 5, 4, Article 53 (January 2015), 20 pages. 2015. DOI: https: //doi.org/10.1145/2629350 [13] Osamah Ibrahim Khalaf, Kingsley A. Ogudo, S. K. B. Sangeetha, "Design of Graph-Based Layered Learning-Driven Model for Anomaly Detection in Distributed Cloud IoT Network", Mobile Information Systems, vol. 2022, Article ID 6750757, 9 pages, 2022. https://doi.org/10.1155/ 2022/6750757
Advances in Science and Technology Vol. 124
485
[14] Qiuzhen Lin, Xiaozhou Wang, Bishan Hu, Lijia Ma, Fei Chen, Jianqiang Li, Carlos A. Coello Coello, "Multiobjective Personalized Recommendation Algorithm Using Extreme Point Guided Evolutionary Computation", Complexity, vol. 2018, Article ID 1716352, 18 pages, 2018. https://doi.org/10.1155/2018/1716352 [15] V. Ranjani and S. K. B. Sangeetha.(2014).Wireless data transmission in ZigBee using indegree and throughput optimization.International Conference on Information Communication and Embedded Systems (ICICES2014), pp. 1-5, https://doi.org/10.1109/ICICES.2014.7033901 [16] M. T. Ribeiro, A. Lacerda, A. Veloso, and N. Ziviani, “Pareto-efficient hybridization for multi-objective recommender systems,” ACM Transactions on Intelligent Systems and Technology, vol. 9, no. 1, pp. 1–20, 2013. https://doi.org/10.1145/2365952.2365962 [17] Sangeetha, S. K. B., Dhaya, R., & Kanthavel, R. (2019). Improving performance of cooperative communication in heterogeneous manet environments. Cluster Computing, 22(5), 1238912395.https://doi.org/10.1007/s10586-017-1637-2 [18] Sangeetha, S.K.B.Dhaya, R.(2022).Deep Learning Era for Future 6G Wireless Communications—Theory, Applications, and Challenges. Artificial Intelligent Techniques for Wireless Communication and Networking. pp. 105-119. https://doi.org/10.1002/9781119821809.ch8 [19] S.K.B.Sangeetha, R.Dhaya, Dhruv T Shah,R.Dharanidharan,and K. Praneeth Sai Reddy.(2021).An Empirical Analysis of Machine Learning Frameworks Digital Pathology in Medical Science,Journal of Physics: Conference Series,1767, 012031, https://doi.org/10.1088/17426596/1767/1/012031,2021. [20] Sangeetha, S. K. B., Kumar, M. S., Rajadurai, H., Maheshwari, V., & Dalu, G. T. (2022). An empirical analysis of an optimized pretrained deep learning model for COVID-19 diagnosis. Computational and Mathematical Methods in Medicine, 2022. [21] Y. Yoo, J. Kim, and B. Sohn, “Evaluation of collaborative filtering methods for developing online music contents recommendation systems,” Transactions of the Korean Institute of Electrical Engineers, vol. 66, no. 7, pp. 1083–1091, 2017. https://doi.org/10.5370/KIEE.2017.66.7.1083 [22] K. Soni, R. Goyal, B. Vadera, and S. More, “A three way hybrid movie recommendation system,” International Journal of Computer Applications, vol. 160, no. 9, pp. 29–32, 2017. https://doi.org/10.5120/ ijca2017913026 [23] B. Song, Y. Gao, and X.-M. Li, “Research on collaborative filtering recommendation algorithm based on mahout and user model,” Journal of Physics: Conference Series, vol. 1437, no. 1, pp. 012095–012101, 2020. https://doi.org/10.1155/2019/7070487 [24] S. Wang, M. Gong, H. Li, and J. Yang, “Multi-objective optimization for long tail recommendation,” Knowledge-Based Systems, vol. 104, pp. 145–155, 2016. https://doi.org/10.1016/ j.knosys.2016.04.018 [25] Y. Yoo, J. Kim, and B. Sohn, “Evaluation of collaborative filtering methods for developing online music contents recommendation systems,” Transactions of the Korean Institute of Electrical Engineers, vol. 66, no. 7, pp. 1083–1091, 2017. https://doi.org/10.5370/KIEE.2017.66.7.1083 [26] Q. Zhang and H. Li, “MOEA/D: a multiobjective evolutionary algorithm based on decomposition,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 6, pp. 712–731, 2007. https://doi.org/ 10.1155/2014/906147 [27] Y. Zuo, M. Gong, J. Zeng, L. Ma, and L. Jiao, “Personalized recommendation based on evolutionary multi-objective optimization,” IEEE Computational Intelligence Magazine, vol. 10, no. 1, pp. 52–62, 2015. https://doi.org/10.1109/MCI.2014.2369894 [28] L. Zuping, “Collaborative filtering recommendation algorithm based on user interests,” International Journal of U- & E-Service, vol. 8, no. 4, pp. 311–319, 2015. https://doi.org/10.1109/ ICCSNT.2015.7490744
Advances in Science and Technology ISSN: 1662-0356, Vol. 124, pp 486-495 doi:10.4028/p-stigt6 © 2023 Trans Tech Publications Ltd, Switzerland
Submitted: 2022-08-28 Accepted: 2022-09-16 Online: 2023-02-27
Development of LSTM Model for Fall Prediction Using IMU Chandramouleesvar V.1, a , Swetha M.E.2, b *, Dr P. Visalakshi3, c Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
1
Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
2
Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
3
[email protected], [email protected], [email protected]
a
Keywords: RNN, Neural Network, Fall Prediction, Fall Detection, LSTM
Abstract. One of the major contributor leading to cause of unintentional injuries after motor vehicle crashes and poisoning, is Falls. The existing Fall Prediction Algorithms are used to predict falls in older or disabled people by analyzing their fall history, capturing their movements through visual sensors (cameras, thermal imaging etc.) in a fixed environment, using inertial sensors to identify the patterns of movements. These algorithms are monologues for each person as they learn from their history and predict falls specific only to that person. The algorithm proposed in this paper aims to predict falls using kinematic data such as accelerometer, magnetometer, and gyroscopic values, for any user. This work involves developing an algorithm capable of predicting falls and to achieve this, we use Long Short-Term Memory (LSTM). The benefit of this algorithm is to prevent trauma to the body or at least reduce the impact of fall and the fatality caused by it. In the future, this algorithm can be used to design a device to predict falls in real-time to scenario be used by everyone irrespective of gender, age, and health. Introduction After motor vehicle crashes and poisoning, falls are the leading cause of unintentional injuries. Around 6,84,000 people die from falling. Falling accidents are very prominent in construction sites, hospitals, old age homes, extreme sports, and uneven terrains. Existing fall detection and prediction systems focus on physiological factors such as gait, vision, cognition, and medical history of individuals. Serious injuries such as broken bones or a head injury have resulted from one out of five falls. Every year, more than 3 million elderly people are admitted to hospitals due to fall injuries. Most traumatic brain injuries are caused by fall accidents. [1] Fall injury treatments and recovery from it are extremely expensive and painful. [2] Inpatient falls are a major problem in hospital settings, many of them leading to heavy trauma or even death. Current falls risk assessment methods, such as the Morse Fall Scale (MFS) , give a risk grade supported by a group of physiological points in the long, but do not provide factors that are most vital for predicting falls in the short run. AI (Artificial Intelligence) techniques provide a chance to enhance the performance of prediction alongside identifying the foremost important risk factors related to medically caused falls such as falls due to replaced hips, Neurological disorders like Parkinson’s, Huntington’s, etc. The same application can be used for predicting falls generally. Inpatient falls occur in controlled and limited environments, but this project aims to predict falls in any setting. The movements of inpatients are limited to a particular location and are also comparatively slow. The health conditions of the inpatients provide a big factor to predict falling whereas this model predicts falls using only kinematic data usable by all. These fall prediction systems can be used in areas where the model must predict falls for any person or object temporarily without having to understand the moving patterns specific to them. The purpose is to design a fall prediction system that uses factors such as acceleration, orientation, gyroscope values and such to alert the person to the possibility of falling, in an instant.
Advances in Science and Technology Vol. 124
487
Literature Review Many kinds of research have been done using several methods to understand the balancing characteristics of humans towards predicting falls. Different types of equipment and analytical methods have been analyzed by scientists and thus have gained a wide knowledge towards falls and have helped in successfully implementing fall prediction techniques. Several types of fall detection and prediction methods have been devised and developed based on the sensors they use. The types are wearable accelerometer Fall detection systems such as those of [3, 4, 5, 6] and IoT based Fall Detection systems by [7, 8, 9, 10, 11, 12, 13] and camera-based fall detection systems [14, 15, 16, 17]. Fall risk assessment- involves studying the medical records and history of falls of an individual who is most likely old or unhealthy or are in-patients who are recovering from a surgery [18, 19, 20, 21]. Risk Assessment systems for humans with neurological disorders such as Parkinson’s [22, 23] have also been developed to protect them and prevent them from furthering the damage. Fall prediction methods are similar to that of fall risk assessment, but, the difference is that fall prediction uses real-time data and predicts the fall moments before it occurs. [14] using motion detection cameras attached to a building structure and analyzing the environment and the normal movements of the construction workers. So, when an anomalous movement is detected, an alert is produced which tells the workers to be more aware and react to a possible fall situation more efficiently. To develop reliable fall prediction systems, researchers have used different techniques in Deep Learning and Machine Learning. Various deep learning algorithms such as CNN, RNN, LSTM and ML algorithms such as KNN, SVM, and Logistic regression were used. The methods by [4, 24, 25, 26, 27] which used Deep Learning showed accuracy greater than 95%. Algorithms like CNN, RNN, LSTM and One-One-One neural network, a combination of the above mentioned three algorithms, were used by [27] , among which One-One-One neural network gave high accuracy of 99.9%, but the LSTM model was unstable when the number of epochs increased, and CNN required minimum of 50 epochs to stabilize and give high performance. [24] developed a methodology that not only detects falls but also identifies people. It is a dataset independent model with a fall detection accuracy greater than 98%, subject identification of 79.6% and a false-positive rate of less than 1.6%. The model uses accelerometer readings for real-time fall detection. Algorithms like CNN, RNN, and LSTM are used to achieve this. A model that uses both Deep Learning and Machine learning algorithms was introduced by [7]. In this model apart from inertial sensors, various image-capturing sensors were also used. The algorithms used were SVM, KNN and ENN. Accuracy was greater than 95% for a small dataset and greater than 98.47% for a large dataset. High accuracy was obtained when the value of k is in the range of 5 to 7 for both ENN and KNN. Between KNN and ENN, performance and time taken for processing ENN were better on both datasets. Major drawbacks are that the model is confined to only closed environments and the performance of KNN, and ENN would worsen if feature normalization were applied. Summary of Existing Models Many researchers have developed fall prediction and detection systems for physically challenged persons, robots, two-wheelers etc., and they have been summarized below. Though there are a few models which have outperformed most of the others, few models have shortcomings in their existing system. [26] developed a method with many benefits and an accuracy of 98.75%, but the memory occupancy of the model was very high due to the use of fog devices and the GRU technique. A Logistic regression model with less computational time was developed by [28] but the overall accuracy of the fall prediction system was very low when compared to other techniques. Many authors have been able to get high accuracy on a dataset that has been acquired in indoor settings, but these systems lacked performance when tested with outdoor data like that of [14]. Algorithms like CNN, RNN, LSTM and One-One-One neural network were used by [27], among which One-OneOne neural network gave high accuracy of 99.9%, but the LSTM model was unstable when the number of epochs increased, and CNN required minimum of 50 epochs to stabilize and give high performance. Table 1 gives details of some papers.
488
IoT, Cloud and Data Science
Table 1: A literature review of Fall prediction papers Article
Year of Publication
Technique
Performance/Benefits
[7]
2021
SVM, ML, ANN, FSM
● Accuracy greater than 95% for a small dataset and greater than 98.47% for a large dataset. ● High accuracy with k in the range of 5 to 7. ● The performance and process time of ENN were better than KNN in both datasets.
[9]
2020
Optimized AlexNet, Convolution neural network.
● Their own KHANCN method has been better than other methods in many aspects ● KHANCN showed an accuracy of 99.45% ● Recognized falls with minimal deviation error rate
[24]
2019
CNN, RNN, LSTM
● ● ● ● ●
[26]
2020
DL (LSTM and GRu)
● Fall detection accuracy was 98.75%. ● The architecture was lightweight virtualization that supported multiple applications and services. ● Using smart gateway as a fog device is advantageous as it does not require a battery and elderly subjects to carry the device.
[27]
2021
Deep neural networks, CNN, RNN
● The One-One-One model performed with an accuracy of 99.9%, precision of 100%, and sensitivity of 100%. ● RNN and LSTM models showed high classification rates of 0.969 and 0.983 respectively. ● One-One-One and LSTM showed high accuracy with smaller number of epochs.
Fine-tuning of model parameters was not needed Dataset Independent Fall detection accuracy >98% Subject Identification accuracy = 79.6% False Positives