150 89 23MB
English Pages 618 [592] Year 2021
Advances in Intelligent Systems and Computing 1311
Debabala Swain Prasant Kumar Pattnaik Tushar Athawale Editors
Machine Learning and Information Processing Proceedings of ICMLIP 2020
Advances in Intelligent Systems and Computing Volume 1311
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by DBLP, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/11156
Debabala Swain · Prasant Kumar Pattnaik · Tushar Athawale Editors
Machine Learning and Information Processing Proceedings of ICMLIP 2020
Editors Debabala Swain Department of Computer Science Rama Devi Women’s University Bhubaneswar, Odisha, India
Prasant Kumar Pattnaik School of Computer Engineering Kalinga Institute of Industrial Technology Deemed University Bhubaneswar, Odisha, India
Tushar Athawale Oak Ridge National Laboratory Tennessee, TN, USA
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-981-33-4858-5 ISBN 978-981-33-4859-2 (eBook) https://doi.org/10.1007/978-981-33-4859-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Conference Committee
Chief Patrons Dr. T. Vijender Reddy, Chairman, Vardhaman College of Engineering, Hyderabad Sri. T. Upender Reddy, Secretary, Vardhaman College of Engineering, Hyderabad Sri. M. Rajasekhar Reddy, Vice Chairman, Vardhaman College of Engineering, Hyderabad Sri. E. Prabhakar Reddy, Treasurer, Vardhaman College of Engineering, Hyderabad
Patron Dr. J. V. R. Ravindra, Principal, Vardhaman College of Engineering, Hyderabad
Program Committee Dr. RajaniKanth Aluvalu, HOD, CSE, Vardhaman College of Engineering, Hyderabad Dr. H. Venkateswara Reddy, COE, Vardhaman College of Engineering, Hyderabad
General Chair Dr. Raman Dugyala, Vardhaman College of Engineering, Hyderabad
v
vi
Conference Committee
General Co-chair Dr. Ashwani Kumar, Vardhaman College of Engineering, Hyderabad
Technical Program Committee Chair Dr. P. K. Gupta, Jaypee University of Information Technology, Solan, India
Special Session Chairs Dr. D. Raman, VCE, Hyderabad Dr. Premanand Ghadekar, VIT, Pune Dr. Debabrata Swain, Christ University, Pune Dr. Saurabh Bilgaiyan, KIIT, Odisha Dr. Banchhanidhi Dash, KIIT, Odisha
Organizing Chairs Dr. Muni Sekhar Velpuru, VCE, Hyderabad Dr. S. Nageswara Rao, VCE, Hyderabad Dr. S. Shitharth, VCE, Hyderabad Dr. S. Venu Gopal, VCE, Hyderabad Dr. Gouse Baig Mohammad, VCE, Hyderabad
Organizing Committee Prof. Vivek Kulkarni, VCE, Hyderabad Mr. Shrawan Kumar, VCE, Hyderabad Mr. S. K. Prashanth, VCE, Hyderabad Prof. A. Bhanu Prasad, VCE, Hyderabad Prof. Mr. A. Ramesh, VCE, Hyderabad Prof. C. Satya Kumar, VCE, Hyderabad Prof. V. Uma Maheswari, VCE, Hyderabad Mrs. S. Shoba Rani, VCE, Hyderabad Mr. Ganesh Deshmukh, VCE, Hyderabad
Conference Committee
vii
Finance Chair Bijay Ku Paikaray, Centurion University of Technology and Management, Odisha
Technical Program Committee Wei Xing, University of Utha, USA Subhashis Hazarika, University of Utha, USA Elham Sakhaee, Samsung Semiconductor Inc., USA Mk Siddiqui, Tecnologico de Monterrey, Mexico Rachakonda Laavanya, University of North Texas, USA Sabyasachi Chakborty, Inje University, South Korea Tinku Acharya, Intellectual Ventures, USA Gaurav Sharma, University of Rochester, USA Brian Canada, University of South Carolina Beaufort, USA Mahendra Swain, Quantum University, Roorkee Sharmistha Roy, Usha Martin University, Ranchi, India Premanand Ghadekar, VIT, Pune, India Saurabh Bilgaiyan, KIIT University, India Banchhanidhi Dash, KIIT University, India Ramakrishna Gandi, AITS, Tirupati, India Monalisa Jena, Rama Devi Women’s University, India Ramu Kuchipudi, Vardhaman College of Engineering, Hyderabad Jayanta Mondal, KIIT University, India Yunho Kim, University of California Irvine, USA Sumeet Dua, Lousiana Tech University, USA R. C. Jain, JIIT, Noida, India T. S. B. Sudarshan, Amrita Vishwa Vidyapeetham, India Sandeep Kumar, Banaras Hindu University, India Florian Luisier, Harvard University, USA Chintan Modi, Indian Institute of Technology, IIT Mumbai, India Sucharita Mitra, University of Kolkata, India Leonard Brown, Computer Science, University of Texas, USA Nigel Gwee, Computer Science, Southern University, USA Alan Chiu, Biomedical Engineering, Louisiana Tech University, USA Michael Dessauer, Computer Science, Louisiana Tech University, USA Yue Feng, University of Glasgow, UK Huiyu Zhou, Queen’s University Belfast, UK Ramesh Rayudu, University of Massey, New Zealand Sharif Naik, Philips Research, Asia––Bangalore Syed Shamsul Islam, The University of Western Australia, Australia Deepak Dahiya, Jaypee University of Information Technology, India
viii
Conference Committee
Donald Adjeroh, West Virginia University Kunal Narayan Chaudhury, Princeton University, USA Yi Fang, Purdue University, USA Xu Han, University of Rochester, USA Sridhar Hariharaputran, Bielefeld University, Germany S. Ali Etemad, Carleton University, Canada Rohit Verma, University of Florida, USA Marc Cheong, Monash University, Australia Kedar A. Patwardhan, GE Global Research, USA Arvind R. Singh, SGGSIET, NANDED, India Dimitrios A. Karras, Chalkis Institute of Technology, Athens, Greece Annannaidu Paidi, Centurion University, India Sony Snigdha Sahoo, Utkal University, India Santwana Sagnika, KIIT University, India Debabrata Swain, Christ University, Pune, India Shanmuk Srinivas Amiripalli, GITAM University, India Abhaya Kumar Sahoo, KIIT University, India Jayanta Mondal, KIIT University, India Suvendu Kumar Nayak, Centurion University of Engineering and Technology, India Ananta Charan Ojha, Centurion University of Engineering and Technology, India Sourav Bhoi, Parala Maharaja Engineering College, Berhampur, India Kalyan Jena, Parala Maharaja Engineering College, Berhampur, India Sai Satyanarayana Reddy, Vardhaman College of Engineering, Hyderabad Priyadarshan Dhabe, Vishwakarma Institute of Technology, Pune Ranjana Jadhav, VIT, Pune Deepali J. Joshi, VIT, Pune Shital Dongre, VIT, Pune Prabhakar Kandukuri, Vardhaman College of Engineering, Hyderabad Gouse Baig Mohammad, Vardhaman College of Engineering, Hyderabad Roshni Pradhan, KIIT University, India Gholamreza Akbarizadeh, Shahid Chamran University, Iran
Preface
In the current era of computing, technologies like machine learning, information processing, Internet of Things, and data analytics have their own significance. Numerous study and research are continuously proposed and implemented using these techniques. This volume contains papers presented at the Second International Conference on Machine Learning and Information Processing (ICMLIP-2020) held virtually during November 28–29, 2020 organized by the Department of Computer Science and Engineering, Vardhaman College of Engineering, Hyderabad, India. The main objective of organizing this conference was to support and promote innovative research works of the students, researchers, academics, scientists, and industry persons of next generation into a common platform for mutual benefits and knowledge sharing. Program Committee of ICMLIP-2020 is very much appreciative to the authors who have shown immense interest in the form of paper submissions and huge publicity throughout India and abroad. So, a total of 204 papers were received, out of which 55 papers were accepted for presentation and publication, after going through a rigorous peer-review process in Springer AISC series. We are very much thankful to our reviewers for their sincere timely efforts in filtering the high-quality papers. We are very grateful to all keynote speakers, Prof. Daniel Dazig Jr, Prof. Siba Kumar Udgata, Dr. Subasish Hazarika, and Prof. Maheshkumar H Kolekar, for making the event memorable. Thanks are due to the Program and Technical committee members for their guidance related to the conference. We would also like to thank the Chief Patron, Patron, General Chairs, Organizing Chairs, and Finance Chair who have made an invaluable contribution to the conference. We acknowledge the contribution of Easy Chair in enabling an efficient and effective way in the management of paper submissions, reviews, and preparation of proceedings.
ix
x
Preface
We are very much thankful to the entire team of Springer Nature for timely support and help. We sincerely hope that you find the book to be of value in the pursuit of academic and professional excellence. Bhubaneswar, India Bhubaneswar, India Tennessee, USA
Debabala Swain Prasant Kumar Pattnaik Tushar Athawale
Contents
Smart Queue Shopping Using RFID System . . . . . . . . . . . . . . . . . . . . . . . . . . Debabrata Swain, Himanshu Pandey, Bhargav Pawar, Nishant Bhat, and Abhijit Gawai Prediction and Classification of Biased and Fake News Using NLP and Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Premanand Ghadekar, Mohit Tilokchandani, Anuj Jevrani, Sanjana Dumpala, Sanchit Dass, and Nikhil Shinde Smart Leaf Disease Detection Using Image Processing . . . . . . . . . . . . . . . . Amit Thakur, Nachiket K. Kulkarni, Tanay Rajwal, and Vivek Deshpande Unsupervised Image Generation and Manipulation Using Deep Convolutional Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Premanand Ghadekar, Shaunak Joshi, Yogini Kokate, and Harshada Kude A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . Debabrata Swain, Aneesh Khandelwal, Chaitanya Joshi, Abhijeet Gawas, Prateek Roy, and Vishwesh Zad Video Categorization Based on Sentiment Analysis of YouTube Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Debabrata Swain, Monika Verma, Sayali Phadke, Shraddha Mantri, and Anirudha Kulkarni
1
13
21
33
45
59
Credit Score Prediction Using Machine Learning . . . . . . . . . . . . . . . . . . . . . Debabrata Swain, Raunak Agrawal, Ayush Chandak, Vedant Lapshetwar, Naman Chandak, and Ashish Vaswani
69
Stock Market Prediction Using Long Short-Term Memory Model . . . . . . Debabrata Swain, Vijeta, Soham Manjare, Sachin Kulawade, and Tanuj Sharma
83
xi
xii
Contents
Efficient Management of Web Personalization Through Entropy and Similarity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sujata H. Asabe, Ashish Suryawanshi, Vinit Joshi, Deepesh Abhichandan, and Gourav Jain
91
Artistic Media Stylization and Identification Using Convolution Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Premanand Ghadekar, Shaunak Joshi, Yogini Kokate, and Harshada Kude Performance Analysis of Different Models for Twitter Sentiment . . . . . . 117 Deepali J. Joshi, Tasmiya Kankurti, Akshada Padalkar, Rutvik Deshmukh, Shailesh Kadam, and Tanay Vartak Electricity Forecasting Using Machine Learning: A Review . . . . . . . . . . . . 127 Shital Pawar, Prajakta Mole, Shweta Phadtare, Dhanashri Aghor, and Pranali Vadtile End to End Learning Human Pose Detection Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Nilesh D. Navghare and L. Mary Gladence Conference Paper Acceptance Prediction: Using Machine Learning . . . . 143 Deepali J. Joshi, Ajinkya Kulkarni, Riya Pande, Ishwari Kulkarni, Siddharth Patil, and Nikhil Saini Object Identification and Tracking Using YOLO Model: A CNN-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Shweta Singh, Ajay Suri, J. N. Singh, Muskan Singh, Nikita, and Dileep Kumar Yadav Real-Time Hands-Free Mouse Control for Disabled . . . . . . . . . . . . . . . . . . . 161 Premanand Ghadekar, Pragya Korpal, Pooja Chendake, Raksha Bansal, Apurva Pawar, and Siddhi Bhor Accounting Fraud Detection Using K-Means Clustering Technique . . . . 171 Giridhari Sahoo and Sony Snigdha Sahoo Transforming the Lives of Socially Dependent to Self-dependent Using IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Debabala Swain and Sony Snigdha Sahoo Enforcement an Evidence and Quality of Query Services in the Cost-Effective Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 G. Vijendar Reddy, Shaik Arshia Zainab, Sathish Vuyyala, and Nagubandi Naga Lakshmi A Machine Learning Approach Towards Increased Crop Yield in Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Shikha Ujjainia, Pratima Gautam, and S. Veenadhari
Contents
xiii
Resume Screening Using Natural Language Processing and Machine Learning: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . 207 Arvind Kumar Sinha, Md. Amir Khusru Akhtar, and Ashwani Kumar Assessment of Osteogenic Sarcoma with Histology Images Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Himani Bansal, Bhartendu Dubey, Parikha Goyanka, and Shreyansh Varshney SMDSB: Efficient Off-Chain Storage Model for Data Sharing in Blockchain Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Randhir Kumar, Ningrinla Marchang, and Rakesh Tripathi Ray Tracing Algorithm for Scene Generation in Simulation of Photonic Mixer Device Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Sangita Lade, Milind Kulkarni, and Aniket Patil A Study of Quality Metrics in Agile Software Development . . . . . . . . . . . . 255 Krishna Chakravarty and Jagannath Singh Optimization of Ray-Tracing Algorithm for Simulation of PMD Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Sangita Lade, Purva Kulkarni, Prasad Saraf, Purva Nartam, and Aniket Patil Real-Time Object Detection for Visually Challenged . . . . . . . . . . . . . . . . . . 281 Ranjana Jadhav, Divsehaj Singh Anand, Aryan Kumar Gupta, Shreyas Khare, Dheeraj Sharma, and Prachi Tapadiya Reinforcement Learning: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Deepali J. Joshi, Ishaan Kale, Sadanand Gandewar, Omkar Korate, Divya Patwari, and Shivkumar Patil Data Encryption on Cloud Database Using Quantum Computing for Key Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Krishna Keerthi Chennam, Rajanikanth Aluvalu, and V. Uma Maheswari Prediction and Prevention of Addiction to Social Media Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Maheep Mahat Analysis of Block Matching Algorithms for Motion Estimation in Video Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Awanish Kumar Mishra and Narendra Kohli Information Retrieval Based on Telugu Cross-Language Transliteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Swapna Narla, Vijaya Kumar Koppula, and G. SuryaNarayana Predicting the Risk of Patients from Corona Virus in India Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Ayush Jha, M. Venkatesh, Tanushree Agarwal, and Saurabh Bilgaiyan
xiv
Contents
Parallel Implementation of Marathi Text News Categorization Using GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Sangita Lade, Gayatri Bhosale, Aishwarya Sonavane, and Tanvi Gaikwad Real-Time Emotion Detection and Song Recommendation Using CNN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Adarsh Kumar Singh, Rajsonal Kaur, Devraj Sahu, and Saurabh Bilgaiyan Drowsiness Detection System Using KNN and OpenCV . . . . . . . . . . . . . . . 383 Archit Mohanty and Saurabh Bilgaiyan Optimized Dynamic Load Balancing in Cloud Environment Using B+ Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 S. K. Prashanth and D. Raman Computer Vision-Based Wheat Grading and Breed Classification System: A Design Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Atharva Karwande, Pranesh Kulkarni, Pradyumna Marathe, Tejas Kolhe, Medha Wyawahare, and Pooja Kulkarni An Approach to Securely Store Electronic Health Record(EHR) Using Blockchain with Proxy Re-Encryption and Behavioral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Kajal Kiran Dash, Biswojit Nayak, and Bhabendu Kumar Mohanta Automated Glaucoma Detection Using Cup to Disk Ratio and Grey Level Co-occurrence Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 V. Priyanka and V. Uma Maheswari Efficient Machine Learning Model for Intrusion Detection—A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 D. Raman, G. Vijendar Reddy, Ashwani Kumar, and Sathish Vuyyala Anomaly Detection in HTTP Requests Using Machine Learning . . . . . . . 445 Ayush Gupta and Avani Modak Best Fit Radial Kernel Support Vector Machine for Intelligent Crop Yield Prediction Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Vijay Hanuman, Krishna Vamsi Pinnamaneni, and Tripty Singh Design of Cryptographic Algorithm Based on a Pair of Reversible Cellular Automata Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Surendra Kumar Nanda, Suneeta Mohanty, and Prasant Kumar Pattnaik Computational Model Simulation of a Self-Driving Car by the MADRaS Simulator Using Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Aseem Patil Success of H1-B VISA Using ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Priyadarshini Chatterjee, Muni Sekhar Velpuru, and T. Jagadeeswari
Contents
xv
ETL and Business Analytics Correlation Mapping with Software Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Bijay Ku Paikaray, Mahesh R. Dube, and Debabrata Swain A Novel Multilevel RDH Approach for Medical Image Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Jayanta Mondal and Madhusmita Das Copy-Move Forgery Detection Using Scale Invariant Feature Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Bandita Das, Debabala Swain, Bunil Kumar Balabantaray, Raimoni Hansda, and Vishal Shukla SP-EnCu: A Novel Security and Privacy-Preserving Scheme with Enhanced Cuckoo Filter for Vehicular Networks . . . . . . . . . . . . . . . . . 533 Righa Tandon and P. K. Gupta Reversible Region-Based Embedding in Images for Secured Telemedicine Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Prachee Dewangan, Bijay Ku Paikaray, Debabala Swain, and Sujata Chakravarty A Spatial Domain Technique for Digital Image Authentication and Tamper Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Monalisa Swain and Debabala Swain Empowering the Visually Impaired Learners with Text-to-Speech-Based Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 Debabala Swain and Sony Snigdha Sahoo Seismic Data Analytics for Estimating Seismic Landslide Hazard Using Artificial Accelerograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Aadityan Sridharan and Sundararaman Gopalan Impact of Presence of Obstacles in Terrain on Performance of Some Reactive Protocols in MANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Banoj Kumar Panda, Prasant Kumar Pattnaik, and Urmila Bhanja Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
About the Editors
Debabala Swain is working as Associate Professor in the Department of Computer Science, Rama Devi Women’s University, Bhubaneswar, India. She has more than a decade of teaching and research experience. Dr. Swain has published number of research papers in peer-reviewed international journals, conferences, and book chapters. She has edited books of Springer, IEEE. Her area of research interest includes high-performance computing, information security, machine learning, and IoT. Prasant Kumar Pattnaik, Ph.D. (Computer Science), Fellow of IETE, Senior Member of IEEE, is Professor at the School of Computer Engineering, KIIT Deemed University, Bhubaneswar. He has more than a decade of teaching and research experience. Dr. Pattnaik has published numbers of research papers in peer-reviewed international journals and conferences. He also published many edited book volumes in Springer and IGI Global Publication. His areas of interest include mobile computing, cloud computing, cyber security, intelligent systems, and brain–computer interface. He is one of the Associate Editors of Journal of Intelligent and Fuzzy Systems, IOS Press, and Intelligent Systems Book Series Editor of CRC Press, Taylor Francis Group. Tushar Athawale is currently working as Computer Scientist at Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA. He is in the domain of scientific visualization for analysis of large-scale data using tools, such as VisIt and ParaView and software development for multi-threaded visualization toolkit (VTK-m). He was Postdoctoral Fellow at the University of Utah’s Scientific Computing and Imaging (SCI) Institute with Prof. Chris R. Johnson as his advisor since October 2016. He received Ph.D. in Computer Science from the University of Florida in May 2015, and he worked with Prof. Alireza Entezari while pursuing his Ph.D. After his graduation, he worked as an application support engineer under the supervision of Robijn Hage in MathWorks, Inc., the developer of the leading computing software MATLAB. His primary research interests are in uncertainty quantification and statistical analysis.
xvii
Smart Queue Shopping Using RFID System Debabrata Swain, Himanshu Pandey, Bhargav Pawar, Nishant Bhat, and Abhijit Gawai
Abstract It is generally observed that billing at shopping stores takes a lot of time, especially during holidays and weekends. A person in India visits a grocery store 1.6 times every 15 days and spends nearly 43 min there. Therefore, it is necessary to have an efficient mechanism, which would help people to shop in smarter way. With this system, we have come up with a solution to the problem of long queues while shopping in a mall or a store. So, our solution uses the concept of Radio Frequency Identification. It is necessary to have RFID tags (transponder) embedded on all the products. So, when the purchased product comes in the range of antenna, the passive tag’s circuit gets activated and sends the prestored data. On the other side, reader (transceiver) will receive these data and send it to the computer. This helps in establishing a very efficient way of billing significantly helping customers save a lot of time. The aim of this solution is to put an end to the long queues we encounter at the shopping centers. The customer has to just drop in his purchases at the RFID zone and head to the billing counters for payment thus establishing a very unique and efficient shopping practice, which is much needed in these modern times. Keywords Radio frequency identification (RFID) reader · RFID tag · Transceiver · Antenna · Transponder D. Swain Computer Engineering Department, Pandit Deendayal Petroleum University, Gandhinagar, Gujarat, India e-mail: [email protected] H. Pandey (B) · B. Pawar · N. Bhat · A. Gawai IT&MCA, Vishwakarma Institute of Technology, Pune 411037, Maharashtra, India e-mail: [email protected] B. Pawar e-mail: [email protected] N. Bhat e-mail: [email protected] A. Gawai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_1
1
2
D. Swain et al.
1 Introduction Generally, when you go to a grocery store or a supermarket, the one thing that you encounter is long queues. Traditional way of billing is bar code system, which takes a lot of time. In a bar code system, every product is scanned one by one, thereby taking considerable amount of time. Using a RFID system, we can scan all the products in one go. RFID was first put to use to identify and authenticate the aircrafts in flight. This allows the identification of joined planes. The concept of electromagnetic fields is used to automatically identify the tags along with tracking them. Remote exchange of electromagnetic waves is the base of this technology. The label is initially fed by the electromagnetic waves and then the chip is activated. The tags are attached directly to the objects. Information is stored electronically in the tag. Passive tags do not have their own power supply, so they rely on nearby RFID reader’s antenna. Line of sight is not required to detect these tags by the RFID reader. On the other hand, line of sights is required to read a barcode, and this creates an advantage that the tags can be embedded in the product, so it cannot be removed or destroyed. RFID uses automatic (AIDC). A RFID reader sends an encoded radio signal to interrogate with the tag. The message is received by the RFID tag and then it responds with its unique ID along with other information. This may include a unique tag serial number, the information about product such as the batch number or the product cost or any other information. As the serial numbers of the RFID tags are unique, the RFID reader can differentiate between tags, if more than one tag comes in the range of the reader and it can read all of them at once. The chip creates an amplitude modulation for the transmission of recorded information. The reader receives this and then transforms it into binary code. This operation is symmetrical and works in both directions. We have linked the output with Google sheets. In Google sheet, we can receive and access the data in real time. We are printing product tag id, product name and total cost of all the products and with the Google sheets all the necessary microcontrollers can access it and edit it in real time.
2 Literature Review Ravindranath et al. [1] have fitted the Radio frequency identification reader (RFID) in the trolley to save time of customer by reducing the time taken to bill the total purchase. They have attached tags to all the products available for sale having operating frequency 125 kHz. When a particular product is brought and kept in cart. RFID tag of product is scanned by RFID reader. Product name, id, cost and total bill amount were shown on the display. A 16 × 4 LCD is used for display. Data are stored by PIC 18F4550 microcontroller. The cost will get added to total bill. Thus, the bill will be calculated in the trolley itself. The bill is sent wirelessly from the
Smart Queue Shopping Using RFID System
3
proposed system to the PC using RF transceiver CC2500. When the cart reaches billing counter then the customer has to pay that bill. Karmouche et al. [2] focused on building low cost system. As in previous case, installing RFID reader on each cart was very costly. Amine Karmouche designed RFID system that is able to scan moving and static products in the shopping space using RFID reader antenna. Instead of conducting the RFID observations at the level of individual carts, aisle-level scanning is performed. They used UHF tags working at 900 MHz frequency. Customers can conduct their purchases by adding products to their carts and display the total of their bill on the touchscreen, which is embedded on the cart itself. They can also get directions to specific aisles or even their current location. This approach aims at a lower implementation cost, an ease of integration and a greater convenience in terms of maintenance. Amazon’s [3] has recently launched “Amazon-Go” where it uses image processing, neural networks, deep learning algorithms, sensor fusion, to predict the item a customer picks and adds it to your virtual basket, and after that a customer can Move out with the shopped products and money is deducted from his Amazon wallet. Vardhan et al. [4] have introduced a RFID smart shelf design in which UHF RFID tagged object can be located precisely among neighboring object throughout the shop. They created RFID system, which combines the technology of RFID and QR codes. The designed tag also has an added advantage of readable capacity and the same tag readable by both QR code reader and RFID reader. Thus, locating desired products in shop became easier. Panasonic [5] a Japanese company, has introduced a “cashier free convenient store concept” using UHF RFID tagged product, scanned while putting it in the trolley. They put RFID tags on all products, and RFID reader on all trolley. Using UHF high precision and faster scanning is possible.
3 Methodology/Experimental 3.1 Components 1. RFID Reader—It is a device that interacts with the RFID tags which are in the vicinity of the antenna. It receives the data transmitted by the RFID tags (Fig. 1). 2. RFID Antenna—It is a device that emits RF waves that can be picked by RFID tags (Fig. 1). The wave is propagated in both the horizontal and vertical directions. The antenna provides an energizing RF signal, communication with a remotely placed devices is established. This overcomes the limitations of an absent external power source or battery. 3. Passive RFID tag (Fig. 2)—It consists of coil and microchip made of silicon. Coil is the source of power for the silicon chip. Electricity gets induced in the coil when it comes in the vicinity of antenna. Silicon chip contains the data (mainly
4
D. Swain et al.
Fig. 1 RFID reader and antenna
Fig. 2 RFID tag
EPC id), rules and protocols through which the data should be transmitted to the reader. 4. Switched-Mode Power Supply (SMPS)—SMPS (Fig. 3) serves the purpose of power and voltage regulation. It is used to convert 220 V AC current supply to 12 V DC current supply, on which ours circuit works. 5. Processor—It is the computer connected through LAN, which serves as microcontroller for the reader. Here the collected data are further processed.
3.2 Algorithm Workflow of the whole system can be explained in the following steps: A. Socket is created first to establish the connection between the computer and the reader. B. Unique EPC global ID of every RFID tag (present in the vicinity of the antenna) is received and stored in a list, as per the command of the user.
Smart Queue Shopping Using RFID System
5
Fig. 3 Switched-mode power supply
C. Then every tag ID is searched in the dataset to get the other product details. D. Total cost is calculated using those details. E. These details are listed, bill is prepared and uploaded to Google sheets in real time. F. The process from steps 2 to step 5 is repeated. General workflow of the RFID system can be seen in Fig. 4.
3.3 Protocol EPC global provides the standards and guidelines for RFID tags and barcodes. They also give the unique electronic product code. There is a particular format for EPC (Electronic Product Code) number, which is combination of unique EPC identifier, a header, and a filter value. Standards for Class 1 Gen 2 tags are also decided by this organization. ISO approved standards for Class 1 Gen 2 tags to become ISO 18000-6C.
6
D. Swain et al.
START
In passive RFID system the RFID tags get power from the reader through inductive coupling method.
When tag is placed in the vicinity of reader an emf is induced in it.
The reader's input is sent to computer/ microprocessor which processes the data.
This increase in current is intercepted by reader as the load information.
This inturn activates the circuit of tag and it transmits the instruction to the reader.
Fig. 4 General RFID system workflow
3.3.1
ISO 18000-6c
Standards for UHF Class-1 Gen-2 tags are described by ISO 18000-6C. It comes under ITF (Interrogator-Talks-First) RFID readers and tags. The main characteristic of ITF RFID system is that the tag sends its information through the modulated radio waves to the RFID-reader only after the tag receives permission to send data from the reader first. There are three parts of communication between reader and tag, which is explained under ISO 18000-6C. These are: A. Information coding B. Modulation of signal C. Anti-collision protocol Anticollision protocol is used to prevent collision of data while reading multiple tags. UHF Passive systems are ITF type tags under ISO 18000-6C, and they use PIE (Pulse Interval Encoding), ASK (Amplitude-Shift Keying), and Anti-collision algorithm like Q-algorithm. A. Information Coding: i. From interrogator to RFID tags: • Class 1 Gen 2 (ISO-18000-6C) uses PIE (Pulse Interval Encoding) for the communication from the reader to the tag. It describes the way in which a piece of information is encoded and prepared to send from reader to tag. • The length of a binary “0” is defined as Tari (Type A Reference Interval), and is used as a reference for several other times in this standard. • First the message is converted into its binary equivalent. A binary “0” is a short high pulse followed by low pulse of equal length. A binary “1” is a
Smart Queue Shopping Using RFID System
7
Fig. 5 Representation of bits (interrogator to RFID)
longer high pulse followed by the same low pulse width (PW), as shown in Fig. 5. ii. From RFID tags to interrogator: • In this technique, if there is change in phase of the signal emitted by the RFID tags then that is treated as a binary “0” by the interrogator, and if there is no change in phase of the signal then that is treated as a binary “1” by the interrogator. • “0” is represented by amplitude change from high to low or low to high, and “1” is represented by constant amplitude either high or low. This representation is depicted in Fig. 6. • This technique decreases the number of signal collision. • This technique also gives interrogator an advantage while detecting the signal collision. B. Modulation of Signal In ISO-18000-6C, PIE and Amplitude Shift Keying (ASK) works in conjunction. ASK describes how the signal is modulated. The wave amplitude is changed according to the data signal, which is in form of “0” and “1”, which can be seen in Fig. 7. The conjunction of PIE and ASK is used to modulate the signal for communication. Graphical representation for data encoding using PIE and signal modulation using ASK is shown in Fig. 8. C. Anticollision protocol Fig. 6 Representation of bits (RFID to interrogator)
8
D. Swain et al.
Fig. 7 Amplitude shift keying (ASK)
Fig. 8 Information encoding PIE + ASK
During an inventory count, there is always a possibility that multiple tags come in the vicinity of the reader, this creates the collision of data, the reader gets confused which data to receive. So, to tackle this problem, different anticollision algorithms are used. In ISO 18000-6C, standard anticollision algorithm used is called Q algorithm. If this algorithm is not present then it will become impossible to read two or more tags present in the vicinity of the reader.
Smart Queue Shopping Using RFID System
9
Fig. 9 Product details and bill display
4 Results and Discussion 4.1 Bill Display Procedure As the customer reaches the counter, the cashier presses the command to read. And then the antenna of the reader gets activated and thus the RFID tags in the basket of the customer get activated. Then the communication between the reader and the RFID tags happens. The RFID reader receives the unique EPC number from the tags. These data are then sent to the host computer through the Ethernet cable (Fig. 9). Now in the database, we have the EPC number of every tag assigned to its product details. Then the EPC number fetched from the customer’s products id then searched in the data base for its details. And when all the items are searched then the final bill is prepared for the customer (Fig. 10).
4.2 Inventory Product Entry to the Database In the warehouse of a store, products come in bulk. So, to do entry for products into the database is a tedious task. RFID systems can make it very simple. When a new product comes into the warehouse, it will be passed through a checkpoint where all the products will be scanned by RFID reader in bulk, and after that all the scanned products are given their details in one go.
10
D. Swain et al.
Fig. 10 Relevant information are stored on Google sheets
5 Limitations A. We cannot detect if there is any error in reading the tags within the range of the reader. B. If the RFID tags are not embedded in the product then they can be easily damaged and thus cannot be detected by the RFID scanner. C. Bill is shown directly at the billing counter and not at the time of putting item in the cart. D. The initial installation cost is higher than the currently available solution. E. A potential power failure can affect the functioning.
6 Conclusion Due to the traditional method of billing, which is by scanning bar code of every product in the cart, it takes a lot of time. Due to which longer queues are created, and a lot of time is consumed for shopping. By noting the problem, the main aim of our project of reducing the time consumed at the billing counter has been fulfilled. Our system allows the customer to bill all the shopped items at once without worrying about time constraints. The generated bill will be displayed on the computer screen on the counter and the customer just has to pay the bill. On the billing counter, RFID sensors clung will ensure that no product can be taken away without scanning. Thus, our solution provides an efficient and full proof way to shop without worrying about long lines.
Smart Queue Shopping Using RFID System
11
Acknowledgements Our team would like to take this opportunity to express our gratitude to the following professionals: First, we would like to thank the Honorable Director of Vishwakarma Institute of Technology, Prof. (Dr.) R.M. Jalnekar Sir for including the concept of Engineering Design and Innovation(EDI) in our syllabus, and our HOD Premanand Ghadekar for giving us this opportunity to work on this innovative project. Finally, we would specially like to thank our guide Mr. Debabrata Swain and Kent ITS for their valuable support and guidance.
References 1. K.R. Ravindranath, A.I. Sanjay, C.M. Prashant, RFID based supermarket shopping system, in 2017 International Conference on Big Data, IoT and Data Science (BID), Pune (2017), pp. 143–147. https://doi.org/10.1109/BID.2017.8336588 2. A. Karmouche, Y. Salih-Alj, J. Abrache, Distributed aisle-level scanning approach for RFID shopping systems, in 2014 International Conference on Logistics Operations Management, Rabat (2014), pp. 1–7. https://doi.org/10.1109/GOL.2014.6887428 3. K. Wankhede, B. Wukkadada, V. Nadar, Just walk-out technology and its challenges: a case of Amazon Go, in 2018 International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore (2018), pp. 254–257. https://doi.org/10.1109/ICIRCA.2018.859 7403 4. G. S. Vardhan, N. Sivadasan, A. Dutta, “QR-code based chipless RFID system for unique identification, in 2016 IEEE International Conference on RFID Technology and Applications (RFID-TA), Foshan (2016), pp. 35–39. https://doi.org/10.1109/RFID-TA.2016.7750744 5. https://www.youtube.com/watch?v=TF8HAhUN_p4 6. https://www.atlasrfidstore.com/rfid-insider/uhf-rfid-tag-communications-protocols-standards 7. https://www.cse.wustl.edu/~jain/cse574-06/ftp/rfid/index.html 8. https://www.youtube.com/watch?v=NrmMk1Myrxc 9. https://www.keysight.com/upload/cmc_upload/All/4144-RFID_WEBCAST_GOLIVE_HM_ FC.pdf 10. https://www.elprocus.com/rfid-basic-introduction-simple-application/
Prediction and Classification of Biased and Fake News Using NLP and Machine Learning Models Premanand Ghadekar, Mohit Tilokchandani, Anuj Jevrani, Sanjana Dumpala, Sanchit Dass, and Nikhil Shinde
Abstract Fake news and Bias news play a very important role in spreading misinformation and thereby manipulating people’s perceptions to distort their awareness and decision-making. In the proposed model for finding the bias, sentiment analysis has been used and for fake news classification, the passive-aggressive classifier has been used. The accuracy achieved for fake news classification using passive aggressive is 95.9%. The bias of an article is calculated based on the sentiment score and the score assigned to the author and publisher based on their average bias score. Keywords Sentiment analysis · Clustering · Bias
1 Introduction Media coverage has a responsibility to cover an event and thus has a great impact on the public as it provides steadfast information about several issues like the environment, politics, technology, etc. [1]. Bias is an inclination toward or against a person or a group of persons, usually in a way believed to be unfair or in layman words, prejudiced someone. Political schism and news bias have recently been taken up as topics of interest [2]. Media bias is a worldwide phenomenon, not limited to only one category of the economy or a certain political system. People should be able to recognize the bias and take bias into account and then only have their views or opinions put forward neutrally without falling into the ruse that is the bias [3–5]. Machine Learning (ML) is a branch of Artificial Intelligence (AI) that allows machines to learn without explicit programming [6]. Natural Language Processing is the technology used to aid computers to understand the human’s natural language. The proposed model uses Machine learning to classify and detect fake and biased news and it uses NLP techniques to get the results [7]. P. Ghadekar (B) · M. Tilokchandani · A. Jevrani · S. Dumpala · S. Dass · N. Shinde Department of Information Technology Engineering, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_2
13
14
P. Ghadekar et al.
2 Literature Survey Erick Elejaide has proposed to find biasing within papers which do not explicitly show it. The news snippets are taken from Twitter. It first plots these news snippets on a cartesian plane that tells more about their orientation based on a Fritz’ quiz. In turn, they put it through another quiz called PolQuiz, which deepens the investigation of the nature of found bias. To feed in the news tweets into the quiz, they first prepared a table of seed queries, which contained a set of preselected words. The tweet’s hashtags were analyzed and those containing the news outlet’s name or those that read original were removed. They used randomized trees for this. Then they used the rank difference method to classify the scores. Last, they conducted the survey [1]. Anish Anil Patankar proposed to find out the bias of each article. They used the Pew research report to collect news sources that readers with varying political inclinations prefer to read. Then they scrape a variety of news articles on varied topics from these varied news sources. After that, they perform clustering to find similar topics of the articles, as well as calculating a biased score for each article. For a news article, they display the bias score, and all the related articles, out of the previously collected articles, from different news sources [6]. Prabhjot Kaur proposed a model where the KNN algorithm is used to classify the news articles, KNN is the k nearest neighbor algorithm, which calculates the nearest neighbor values in the input dataset. The values are calculated using the Euclidean distance formula. The k value is selected from the network and based on that the data can be classified into certain classes. The number of hyperplanes depends upon the number of classes into which data need to be classified. The accuracy came out to be 91% [8]. Rowan proposed an architecture where fake news generation and detection is portrayed as an argumentative game, with two players. Fake stories were generated that matched stated attributes: generally viral or persuasive. Stories must come out as realistic as possible to both human users as well as the verifier. The goal is the classification of news stories as real or fake. The verifier has access to any number of real news stories, but only some fake news stories from a specific opponent. The accuracy came out to be 80.18% [9].
3 Dataset Description The dataset used for the detection of bias in news consists of data belonging to the All News Dataset from Kaggle. It consists of a total of 682 records and 10 columns. The features that are present in the dataset are Title, Publication, Author and Content. The dataset for classification of fake news consists of 7796 rows and 4 columns. The features include Title, Text and label.
Prediction and Classification of Biased and Fake News …
15
4 Proposed Model 4.1 Detection of Bias in a News Article In the proposed model, the techniques of machine learning have been used. First, the data have been preprocessed. Feature Selection is where one automatically or manually selects features, which contributes the maximum to the interested output. Irrelevant or partially relevant features can contribute nothing or even negatively impact the performance of the model. Title, publication, author and content have been chosen to work on because of relevancy and other features have been dropped. There are different types of methods to select features but for this model, the features have been selected manually. Then for calculating the bias, sentiment has been calculated. This is done using TextBlob. Then clustering on a different basis is done. Clustering is the division of the collection of data points into a number of groups called clusters so that data points in the same groups are more alike to other data points in the same group than those in other groups. In clustering, the aim is to separate out groups with similar traits and assign them into clusters. Algorithm of the proposed model as shown in Fig. 1: Tokenization. The method of breaking up running text of paragraphs into words and sentences is called tokenization. RegexpTokenizer splits a string into substrings using a regular expression. Stop Word Removal. The next step of data preprocessing is stop word removal. Words that are filtered out before the processing of text are called stop words. Words like a, the, an, in, at, that, which, and on are called stop words. All the words that matched with the words in the stop list were removed to reduce complexity. Stemming and Lemmatization. Stemming is the simple heuristic process that cuts off the ends of words in the hope of attaining the goal of removing multiple tenses of the same word. Lemmatization basically truncates the ending of words and returns the base form of a word. For this, NLTK has been used. Specifically, Porter stemmer has been used for stemming. Porter’s stemmer advantage is its simplicity and speed. Fig. 1 Flow of the proposed model phase-I
Tokenization
Stop word removal
Stemming and
Calculating Sentiment
Clustering
Calculating final bias
16
P. Ghadekar et al.
Sentiment Analysis. In order to calculate subjectivity and polarity TextBlob library has been used which returns the two properties namely subjectivity and polarity. These properties take values in the range of −1 to +1. Polarity is a float. It lies in the range of [−1, 1] where 1 means positive statement and -1 means a negative statement. Subjective sentences usually refer to a personal view, emotion or judgment whereas objective refers to factual information. Subjectivity is also a float. Subjectivity lies in the range of [0, 1]. Giving the Publisher a Score and Clustering on the Same Basis. The publishers have been given a score based on the bias of the articles they have published. The publisher score is calculated as the average of the scores, i.e., the polarity of each article that the publisher has published and the clustering has been done on the same. Giving the Author a Score and Clustering on the Same Basis. The authors have been given a score based on the bias of the articles they have published. The author score is calculated as the average of the scores, i.e., the polarity of each article that the author has published and the clustering has been done on the same. Sentiment score = bias score of author + bias score of publisher + polarity of content.
(1) Calculating Final Bias. The final bias is based on the score of the article, the author and the publishers score all combined. It is basically the average of all three.
4.2 Classification of Fake News Articles The proposed model phase II is shown in Fig. 2: Data collection. Data collection is a collection of data from various sources in a raw form. There are multiple techniques to gather data. Some of the common data collection methods are journals, repositories, interviews.
Data collection
Data preprocessing
Count vectorizer
Tf-IDF vectorizer
Data splitting
Data training
Data testing
Accuracy
Fig. 2 Flow of the proposed model phase II
Prediction and Classification of Biased and Fake News …
17
Data preprocessing. Data preprocessing involves reforming raw data into a particular format. Data in the real world are usually not complete, inconsistent, and is not error-free. Data preprocessing is a recognized method of undertaking such problems. Raw data are processed so that it can be used in the machine learning algorithms. Post-preprocessing the data is in the clean form. Data preprocessing is a data mining technique type of data mining technique. • Cleaning: The data can have unrelated and missing parts. In the model, for cleaning, in order to convert the text into vector form, Countvectorizer and TF-IDF Vectorizer has been used. • Feature Selection: Feature Selection is the selection of only attributes in the dataset that provides the maximum accuracy. In this model, no features have been dropped because all 3 columns, i.e., title, text and label because all three were required for the model. There are different types of methods for feature selection but here, in the proposed model, features have been selected manually. Data splitting. In data mining, data are partitioned into two portions. One is the training set of data and another is a testing set of data. The data have been split in the ratio 90:10, i.e., 90% for training purposes and 10% for testing purposes. Model training. This model has been trained using passive-aggressive classifier technique and multinomial Naïve Bayes technique. This model has been trained for 90% of data so that output is tuned correctly. Model testing. After the training is over, the test data are given to the model for prediction. During model testing, 10% of remaining data are given to the model. Calculation of Accuracy. By using a passive-aggressive classifier, the accuracy came out to be 95.9%. By using the Naïve Bayes classifier, the accuracy came out to be 88.45%.
5 Model Details 5.1 Confusion Matrix True Positive (TP): Observation is positive, and is predicted to be positive. False Negative (FN): Observation is positive, but is predicted negative. True Negative (TN): Observation is negative, and is predicted to be negative. False Positive (FP): Observation is negative, but is predicted positive. Accuracy = TP + TN/(TP + TN + FP + FN).
(2)
18
P. Ghadekar et al.
5.2 Naïve Bayes for Fake News Classification In the Naïve Bayes, the probability of whether the article is fake or not is calculated using Eq. 3. P(c/x) = (P(x/c) ∗ P(c))/P(x).
(3)
This model basically deals with the conditional probability of the news articles. Equation 4 is the Laplace smoothing equation. If any of the probability comes out to be equal to zero, it is put through this equation first then the output is substituted into the original equation. θi = (xi + α)/(N + αd).
(4)
Here, θi xi α N d
probability of fake news articles; wordcount; constant that is 1; total number of words. distinct words.
The difference between Naïve Bayes and Multinomial Naïve Bayes classifier is that the Naïve Bayes refers to the strong independence assumptions in the model, rather than the particular distribution of each feature. While on the other hand Multinomial Naïve Bayes is a multinomial distribution, rather than some other distribution. This works well for data that can easily be turned into counts, such as word counts in text.
5.3 Passive-Aggressive Classifier for Fake News Passive: if correct classification, keep the model; Aggressive: if incorrect classification, update to adjust to this misclassified example. It gets an example, learns from it and then throws it away.
6 Experimentation For the Fake news classification, first, a bag of words gave an accuracy of 89%, TfIdf vectorizer gave an accuracy of 83.8%, Naïve Bayes gave an accuracy of 88.45%. Finally, the passive-aggressive classifier gave the highest accuracy, 95.9%. A comparison of the existing models and proposed models are given in Table 1 and Fig. 3. Accuracy at different splits for fake news is shown in Table 2.
Prediction and Classification of Biased and Fake News … Table 1 Performance analysis table
19
Author
Method
Accuracy (%)
Prabjhot Kaur
KNN
91
Rowan Zellers
Neural network
80.18
Kai-Chou Yang
NLI models
88.9
Shivam B. Parikh Content cues based method
70
Proposed Model
Naïve Bayes
88.45
Proposed Model
Passive aggressive Classifier 93.8
Fig. 3 Performance analysis graph
Table 2 Accuracy at different splits for fake news
Splits (%)
TFIDF vectorizer (%)
Count vectorizer (%)
Passive aggressive (%)
90–10
83.8
89
95.9
85–15
84
89.4
94.7
80–20
81.9
89.2
94.7
75–25
81.8
88.5
94
70–30
80.5
88.1
94
7 Conclusion The proposed model of this project has been successfully implemented and it has been proved that the accuracy of this model is better than most of the other models in the literature. The accuracy came out to be 95.9%. The comparisons are given in Table 2. As shown in Table 2 followed by Fig. 3, the performance of this proposed model comes out to be 95.9%. The training model is tuned in such a way that it predicts with maximum accuracy, as shown in Table 2. Using the passive-aggressive classification
20
P. Ghadekar et al.
model instead of the Naïve Bayes model has improved the accuracy of the proposed model from 89 to 95.9%.
References 1. E. Elejalde, L. Ferres, E. Herder, On the Nature of Real and Perceived Bias in the Mainstream Media (2018) 2. Pew Research Center, in Sharing the News in a Polarized Congress. [Online]. Available: https:// www.people-press.org/2017/12/18/sharing-the-news-in-a-polarizedcongress/, 18 Dec 2017 3. F. Ming F, F. Wong, C. Tan, S. Sen, M. Chiang, Quantifying political leaning from tweets and retweets, in Proceedings of the 7th International AAAI Conference on Web and Social Media (AAAI Press, Boston, MA, USA, 2013) 4. D. Saez-Trumper, C. Castillo, M. Lalmas, Social media news communities: gatekeeping, coverage, and statement bias, in Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (CIKM’13) (ACM, 2013), pp 1679–1684 5. A. Dallmann, F. Lemmerich, D. Zoller, A. Hotho, Media bias in German online newspapers, in Proceedings of the 26th ACM Conference on Hypertext and Social Media (HT’15) (ACM, 2015), pp. 133–137 6. A. Anil Patankar, J. Bose, H. Khanna, A bias aware news recommendation system 7. H. Isahara, Resource-based natural language processing (2017) 8. P. Kaur, R.S. Boparai, D. Singh, Hybrid Text Classification Method for Fake News Detection (2019) 9. R. Zellers, A. Holtzman, H. Rashkin, Y.B. Ali Farhadi, F. Roesner, Y. Choi, Defending Against Neural Fake News (2019)
Smart Leaf Disease Detection Using Image Processing Amit Thakur, Nachiket K. Kulkarni, Tanay Rajwal, and Vivek Deshpande
Abstract The present agricultural scenario, farmers are suffering from one major problem, which is less crop productivity or crop losses, and farmers suffer a lot due to these losses. Crop losses are caused not just by weeds, but insect damage and plant diseases also contribute a significant amount to this cause. This paper gives some methods and efficient ways of detecting plant illness utilizing image processing and alarming the illness to the farmer by email, SMS and displaying the infection name on the framework users screen display. Also, we have introduced a distinguishing feature which is we do not only show the infection detected to the plant but also provide effective remedies to the farmer so that they can act fast on it. Keywords Image processing · Algorithms · Python · GUI · CNN · Drone
1 Introduction India all over the world is well known for its production of horticulture and agriculture processes. Most of the population depend upon horticulture, farmers having different types of cultivation options in their field. However, due to lack of care and current global environmental issues such as climate change, pollution, etc. causes many problems to crop and makes the crops to catch infections and diseases [1]. Crop infection affects farmers in a very negative way as this reduces crop production, and farmers have to spend money on buying the required treatment to cure the infection. To solve this problem of farmers, we have come up with a solution via the integration of drone and image processing techniques creating a Mobile application for the farmer to allow them to quickly detect the infection and provide immediate and effective remedies. When a plant has the disease, the primary markers are the A. Thakur · N. K. Kulkarni Department of Electronics Engineering, Vishwakarma Institute of Technology, Pune, India T. Rajwal (B) · V. Deshpande Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_3
21
22
A. Thakur et al.
leaves of the plant’s disease. Generally, spots are observed on leaves due to illness. However, when the plant catches many infections, the whole leaf is secured by the dark spots or sickness spots, which helps us to detect the illness more easily. Plant diseases can be broadly classified according to the nature of their primary causal agent, either infectious or noninfectious. Infectious plant diseases are caused by a pathogenic organism such as fungus, viruses, bacteria, and others, in nature, plants may be affected by more than one disease-causing agent at a time. Quick and easy detection of illness in the plant is the utmost need for farmers and agricultural experts. In most of the plants, the illness markers are normally found on plant leaves. Quick and efficient diagnosis of the plant can be made with the newest inventions and algorithms like machine learning and image processing etc. [2, 3]. The aim of system is to detect plant diseases using image processing and with the help of drone collecting plants images from different parts of the farm or in the specific area and feeding these to our algorithm for effective infection detection. Hence, in the proposed work, we have considered the detection of plant diseases present on leaves. Image acquisition, processing of images, extraction of features, recognition and order of plant infection are the important pillars for the disease discovery utilizing image Processing and algorithm.
2 Materials and Methods 2.1 Dataset Our primary dataset was a collection of images of plant leaf infected with different types of infection/disease. We have collected over 3100 images of the different leaves from the Agriculture College (Pune), Kaggle Website, and by visiting various farmlands physically in different geographical areas to collect the dataset. The dataset was being shot through the different image extensions but altered all the images were converted to the standard “.jpg” extension. Test data = 2100 images in.jpg format Train data = 1000 images in.jpg format. So, in all 3100 images, out of which Train data = 32.258% Test data = 67.742% The dataset consists of the images which were classified into four different types of diseases: 1. Healthy Leaf 2. Bacterial Infection 3. Virus Infection
Smart Leaf Disease Detection Using Image Processing
23
Fig. 1 Plant leaf caught with a late blight infection; this is the images used for training dataset used to train the algorithm
Fig. 2 Plant leaf caught with a virus infection; this is the images used for training dataset used to train the algorithm
4. Late Blight Infection (Figs. 1, 2 and 3).
2.2 Methodology This section mainly describes a flow that summarizes the block diagram that is used in processing the image and detecting the infection caused to the plant, as our primary user are farmers, so we have developed an application mainly designed accordingly to the farmers, the app supports up to 12 different Indian languages, and the farmer can choose the language he is comfortable. The images which the farmers want to analyze can either be taken manually by the farmer or via a drone which is connected to the application and after the image is feed on the app, the app analyses the image
24
A. Thakur et al.
Fig. 3 Plant leaf caught with and bacterial infection; this is the images used for the training dataset used to train the algorithm
and detect the infection by using the different classifier algorithms and provides the solution on that infection. The app will classify the image of the leaf into four different categories. We have used a matrix for labeling the different category of infection, for a healthy plant we have [1,0,0,0], if it is bacterial infection then we have [0,1,0,0], for virus infection we have [0,0,1,0] and for late blight we have [0,0,0,1] (Fig. 4).
REGISTER THE FARMER
UPDATE THE DATABASE BY SELECTING THE IMAGE
. FROM DRONE
ALGORITHM
CLICKED IMAGES
RESULT
Fig. 4 Flow diagram of the plant infection detection system, where when a user starts the application he has to follow the path defined in the flow chart
Smart Leaf Disease Detection Using Image Processing
25
Fig. 5 DJI Mac air drone used by the user to collect the images of plants from different places
2.3 Drone We have used a DJI Mac Air drone to capture images from different trees and plants, the images are captured from the drone manually via a human control, and the received HD images are then fed into the database of the GUI, from which the user will be able to select the image he wants to analysis. The images from the drone were taken from the 12-MP CMOS sensor and a lens with a focal length of f/2.8 lens with a 35 mm nearly equal to focal length of 24 mm (Fig. 5).
2.4 Image Preprocessing Image collection and data management or preprocessing of images are among the most difficult and most important section of digital picture processing. The mechanism behind the image preprocessing is to take out the commotion in the picture and change some of the features of pixel such as—pixel esteems, which upgrades the features of the picture. We are also utilizing Image enhancement for that purpose [4, 5] and Image resizing as the piece of Image preprocessing. The images of the different extensions were converted into the standard extension, i.e., “.jpg” extension. Different Image captured from the drone, which is of the different extension, is converted to the.jpg extension. The images from the drone were taken from the drone cameras. We have used the libraries such as cv2, NumPy for this project, first two directories were made. (1) Training data (2) Test data Each having the image of size 50 × 50 × 3, each having labeling as follows. We have used a matrix for labeling of the different category of infection, for a healthy plant we have [1,0,0,0], if it is bacterial infection then we have [0,1,0,0], for virus infection we have [0,0,1,0] and for light blight we have [0,0,0,1]. We then train and test the dataset from their respective directories, each of the data set has a (.jpg) with the average size of around 9311 bytes and the total image
26
A. Thakur et al.
which are being shuffled cost around 2100 image divided into 210 × 10 matrices inside the directories.
2.5 Classification We are using the convolution neural network for this project. The Relu activation function mathematically is Y = x cos x
(1)
This model takes less time to run, and there is no complicated math associated with it. We have implemented the Relu activation function for our CNN model. The number of Epochs for this is considered to be Eight. For the regression analysis, we have used the Adams optimizer with a learning rate of 1 × 10−3 to update the network weight on the Epochs iterative based training data. The Adams optimizer is used with the adaptive gradient algorithm which maintains a preparameter learning rate that improves performance on the problems of the sparse gradient on the detection of the four diseases like– • • • •
Virus Bacteria Late blight Healthy
The loss used here is the Categorial Cross-Entropy, the GUI was also made based on the python framework called as Tkinter, with each window n = being sized about (650 × 510) size and would accept the image from the user, analyze the image and then the image is being classified based on the pretraining model, whether it belongs to which of the categories, also the app would give an optimal remedies for the particular diseases which would be detected by the app. In the project, we have used a total of the Propagation Layer of 32 filters, each having a kernel size of 3 × 3, and the Stride Size is 3. In the second network, we have used 32 filters, which has a size of 3 × 3 kernel again the Stride Size being 3. In the third network, we have used 64 filters having a kernel size of 3 × 3 with the Stride Size being 3. In the fourth network, we have used 128 filters having a kernel size of 3 × 3 with the Stride Size being 3. In the fifth network, we have used 32 filters having a kernel size of 3 × 3 with the Stride Size being 3. In the sixth network, which is the fully connected layer with the 1024 layers again with the stride size of 3 and having a drop rate at 0.8, we have used the Relu activation layer inside each network.
Smart Leaf Disease Detection Using Image Processing
27
The padding of the user biases is nullified for this in all. We have used five layers of CNN, and the sixth layer is used as a fully interconnected layer.
3 Model of Internal Working The data input which is being given in the form of the trail of the images either through the snapshots of the images taken through the phone or shot through the drones have a considerable size, so the reduction in the size has been made at the same time the required features have been maintained, so the input images have been chopped down to 50 × 50 × 3 images where the size indicates the IMAGE_SIZE × IMAGE_SIZE × RGB_CHANNELS. We have made use of the convolution neural networks, which is based on the multilayer perceptron, and the ConvNet mechanism is used for it. So, whenever the image is given as an input to the CNN model, then the ConvNet is successfully able to capture the spatial and dependencies with the help of the required filters. This ConvNet makes the reusability of the weights, thus making it the better fitting of the image inside the dataset. The size of the image being taken is the 50 × 50 × 3, and the Kernel size being used is 3 × 3, which is an RGB filter. The max-pooling is being performed after each layer to reduce the size of the image, so for it, and we make use of the max-pooling having the size of the 3 × 3 matrix giving the maximum value from the applied convolution. The same analogy is being applied to the next layer where the filters used are 64 and the stride size is again 3. Similarly, all the five layers are being connected with the help of the weight and the biases, and then the last layer, which is the fully connected layer, is being connected with the help of the 1024 filters having a probability of 0.8. Next, we make use of Adam’s optimizer with the learning rate of 0.001 with the propagation of 8 epochs, the loss function being used is the Categorial Cross-Entropy. Thus, with the help of the cross-categorical cross-entropy as the predicted output is being diverted from the actual output, then the Cross-entropy loss increases. The image is given input is significant, so we have chopped it down to the size of the 50 × 50 without the loss of the features and the edges. This model does not require the traditional method of feature extraction, but with the help of CNN, the traditional method can be ultimately be eliminated, and thus we would get the accurate output but at the cost of the higher computational resources. An Email and a text message are also sent to the Farmer and the Government officials about the analysis of the plant, which also contains the information about the disease in which that leaf was suffered and the required remedies which are being required for it. −
M c=1
yo,c log( po, c)
(2)
28
A. Thakur et al.
The above formula is the expression for the cross-categorical entropy (Figs. 6 and 7).
PLANT DATASET
DATA ACQUISATION
DATA PRENG
CATION
Fig. 6 The flow diagram for the model of the internal working of the application after it has received the image to process
Fig. 7 Loss-function showing the real-time behavior of the classifier
Smart Leaf Disease Detection Using Image Processing
29
F-1 Function: Mean F1 score, The F1 score is computed separately for all classes by using: (1) (2) (3) (4) (5)
“P” refers to the precision “R” refers to the recall “Tp” refers to the number of True Positives, “Fp” refers to the number of False Positives “Fn” refers to the number of False Negatives The best performing model achieves a mean F1 score of 0.993. Accuracy is 99.34%. Random guessing will only achieve an overall accuracy of 2.63% on average.
4 Results We were able to run the application successfully, and the desired results are obtained (Figs. 8, 9 and 10). We are also providing an email notification feature for our GUI, which enables the user to receive a complete report of the analysis with the solution on his mail.
Fig. 8 Second slide of the GUI, where the user is asked to select the photos he wants to analysis and get the result for the type of infection after the user clicks the “Get pictures” button, he can select the pics from the files
30
A. Thakur et al.
Fig. 9 Result/output the user got after he selected the image of the leaf he wants to analyze
5 Related Work There are many works and inventions done in this field of leaf infection and helping the farmers to find the methods to help avoid the plants from getting infected. An review on the paper Plant Leaves Disease location utilizing Image Processing Techniques by Gavhale and Gawande (2014) introduced methods for picture preparing procedures for a few plant groups that have been utilized for perceiving and detecting plant illnesses. The most efficient methods and systems for identification of plant illness and disease are, Support Vector Machine (SVM), K-closest neighbor (KNN), and Spatial Gray-level Dependence Matrices (SGDM). These strategies are utilized to investigate the solid plant leaves [6]. Usage of RGB and Grayscale pictures in plant leaves
Smart Leaf Disease Detection Using Image Processing
31
Fig. 10 Remedies the GUI provided after showing the infection it detected for the plant
discovery—a similar research by Padmavathi and Thangadurai (2016) has given the near consequences of RGB and Grayscale pictures in leaf illness and spots detection. In the detection of contaminated and spots leaves, shading turns into a vital component to discover malady power. Grayscale and RGB pictures are considered and utilized the middle channel for picture improvement and division for extraction of the sick bit, which is utilized to recognize the sickness level. The plant ailment acknowledgment display, in view of leaf picture order, by the utilization of profound convolution systems has been created [7].
References 1. G.S. Dhaliwal, V. Jindal, B. Mohindru, J., Crop losses due to insect and pest: global and Indian scenario. Int. J. Entamol. 77, 165 (2015) 2. S. Raut, A. Fulsunge, Plant disease detection in image processing using Matlab. Int. J. Innov. Res. Sci. Eng. Technol. 6(6) (2017). https://doi.org/10.15680/IJIRSET.2017.0606034 3. S. Naikwadi, N. Amoda, Advances in image processing for detection of plant diseases. Int. J. Appl. Innov. Eng. Manage. (IJAIEM) 3(11) (2015) 4. K. Thangadurai, K. Padmavathi, Computer vision image enhancement for plant leaves disease detection, in 2014 World Congress on Computing and CommunicationTechnologies
32
A. Thakur et al.
5. The link gives the image enhancement process used by Matlab https://in.mathworks.com/discov ery/image-enhancement.html 6. N. Ramanathan, T. Schoellhammer, E. Kohler, K. Whitehouse, T. Harmon, D. Estrin, Suelo: human-assisted sensing for exploratory soil monitoring studies (2009) 7. P.R. Harshani, T. Umamaheswari, Effective crop productivity and nutrient level monitoring in agriculture soil using IoT, in 2018 International Conference on Soft-computing and Network Security (ICSNS)
Unsupervised Image Generation and Manipulation Using Deep Convolutional Adversarial Networks Premanand Ghadekar, Shaunak Joshi, Yogini Kokate, and Harshada Kude
Abstract In recent years, there has been an outburst in the field of Computer Vision due to the introduction of Convolutional Neural Networks. However, Convolutional Neural Networks have been sparsely used for unsupervised learning. The advancement of computational power and large datasets provide large opportunities to apply deep learning for image processing. This paper proposes a new architecture based on Deep Convolutional Generative Adversarial Network (DCGAN) for unsupervised image generation, its usage for image manipulation tasks such as denoising, superresolution, and deconvolution. This proposed model demonstrates that the learned features can be used for image processing tasks—demonstrating their applications for general use as DCGAN learns from large datasets and adds high-level image details and features where traditional methods cannot be used. While the image results from the proposed network architecture and the original DCGAN architecture are similar in terms of performance, they are visually better when viewed by humans. Keywords Convolutional neural networks · Generative adversarial networks · Unsupervised learning · Image processing
1 Introduction In the field of Computer Vision, models can be trained to represent images as representations from virtually limitless media samples present in the wild. Generative Adversarial Networks [1] can be used to obtain good image representations, wherein features are extracted using the submodels, which are the generator and discriminator, respectively. Furthermore, it has also been observed that the learning process of GANs is an interesting approach to representation-based learning. However, it has been seen that GANs are highly unstable to train, often resulting in nonsensical
P. Ghadekar (B) · S. Joshi · Y. Kokate · H. Kude Department of Information Technology, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_4
33
34
P. Ghadekar et al.
outputs from generators, and therefore care must be taken with the hyperparameters. DCGAN solves these problems by introducing concepts used in Convolutional Neural Network.
1.1 Literature Survey 1.1.1
Image Denoising
For this task, various learning-based algorithms based on have been used, the sparse denoising auto-encoder architecture being the most commonly used one to carry out the said task [2, 3].
1.1.2
Image Super-Resolution
Recently, Deep learning algorithms have usually been used to achieve superresolution (SR). For the creation of high-resolution images, feature maps are used which are constructed using Deep CNN’s [4–6]. As seen from the works of Ledig et al. Generative Adversarial Networks were proposed for the task of super-resolution. The usage of the min–max algorithm being the primary idea in which the GAN model uses to generate authentic looking images.
1.2 Problem Statement and Applicability Till the advent of Generative Adversarial Networks (GANs), most breakthroughs in deep learning used discriminative machine learning models, usually those which mapped a latent high-dimensional input to a class label [7] 8. These breakthroughs were based on using nonlinear activation functions [9] 10, the backpropagation algorithm, finally using gradient descent for reducing loss. The proposed model is based on DCGAN architecture is used to generate unsupervised images.
2 Approach and Methodology 2.1 Proposed Model Based on DCGAN This paper is based on the idea of using a deep learning-based model to generate new data as proposed by Goodfellow et al. [11]. The main tenet of DCGAN being the use of the generator (G) and discriminator (D) models simultaneously, which are trained
Unsupervised Image Generation and Manipulation …
35
to generate the images using downsampled or noisy input and to differentiate when given two images—original and generated, which one is generated and real. Both the D and G models are adversarial in nature, and so are trained such that the generator generates more and more authentic images. At the same time, the discriminator gets better at differentiating between the generated and real images, such that the images generated finally are virtually indistinguishable from real ones. This is formalized in the Minmax Value Function. Minimax Value Function: V (G, D) : min max DV (D, G) = E X ∼ pdata(x) log D(x) G D + E Z ∼ p Z (z) log(1 − D(G(z)))
(1)
where, G = Generator, D = Discriminator. D(x) represents the probability that the generated image is generated rather than the training set. The discriminator D, as its name states are trained such that it has the maximum probability of differentiating between the training examples and samples from the Generator G along with assigning the correct label. The G is trained to minimize the probability to let the generated images pass as data set images. Simply put the two models engage in a minimax process concerning the value function. The minimax algorithm was implemented iteratively. In practice, the discriminator optimization was computed heavy and thus unfeasible it also resulted in overfitting to compensate the optimization was done in k steps, one for D and one for G. This results in the D model having near-optimal loss thus if G reduces its loss accordingly. Also, keeping the above simple model as the base, the D and G loss function was augmented, and in some parts, modified inspired by [12] and built on the foundation provided by [13, 14] (Fig. 1).
Fig. 1 The proposed architecture
36
P. Ghadekar et al.
2.1.1
The Generator and Discriminator Models
Here, deep convolutional neural networks (CNN’s) and residual networks [15, 16] are used. The proposed architecture is based on Ledig et al.’s work [12] and made to be generic to perform image processing tasks such as denoising, super-resolution, etc. The hyperparameters and layer stacks are found using empirical methods. An upscale layer is used between the ResNets for the generator network. This layer upscales the base image by 2× of the original resolution. The proposed models learn from the change in inputs to perform different tasks.
2.1.2
Cost Function
The Goodfellow et al. proposed loss function is modified to suit the proposed model better. Generator Loss function: l G = 0.08 ∗ lcontent + 0.2 ∗ l G
(2)
here lG is the generator loss while lcontent (content loss) is with respect to the l1 norm difference between generated image and the original image: lcontent = k Igenerated − Ioriginal k1
(3)
k is the image while l G,adv is the adversarial loss NlG = X − log D G Iinput
(4)
n=1
2.1.3
Adversarial Training and Hyperparameter Tuning:
The proposed DCGAN inspired model was trained on four datasets, MNIST Handwriting Dataset, Large Scale CelebFaces Attributes (CelebA), Street View Housing Numbers, and the MIT places database. Except for the training images being resized, no preprocessing measures. Stochastic Gradient Descent (SGD) [17, 18] was used with an input batch size of 128 to train the models. Weight initialization was done with a zero-centered Gaussian distribution with a 0.02 standard deviation. In the Leaky ReLU activation function, the slope of the leak was set to 0.2 in the generator and discriminator. While in previous GAN works, momentum was used, in this paper used Adam optimizer [19]. It was found that the suggested learning rate of 1e-3 was too high, using 2e-4 instead. Also, it has been seen when momentum β1 was reduced to 0.5; the model stabilized well.
Unsupervised Image Generation and Manipulation …
37
3 Experiments 3.1 Datasets and Evaluation Measurements • Modified National Institute of Standards & Technology (MNIST) [20] • MNIST is a database of handwritten digits and is derived from the NIST dataset. In this, 60,000 trains and 10,000 test data points are present. In MNIST, the digits are size normalized and centered. MNIST is like the “Hello World” of Machine Learning, and this served as an initial basis for this project [21]. • Large-Scale Celeb Faces Attribute (Celeb-A) [22] • The Celeb-A dataset contains facial images of celebrities scraped from random internet searches. The celebrities’ names are taken from DBpedia, with a condition for those born in the modern era. It encompasses 10,000 individuals in 3 × 106 images. As a constraint, only those images having a dense pixel count were kept. These images are used for training. No data augmentation was applied. • Street View House Dataset [23] • This is an image dataset consisting of house numbers in Google Street View images. • MIT places Database [24] • This dataset comprises of 200+ scenes and 2 million+ images. Since a computation power constraint was present, the entire dataset was not utilized. Instead, only the test set was used for training. It consists of 41,000 images. PSNR was used to measure the similarities between the model output with the original image.
4 DCGAN Model See Fig. 2.
5 Experimental Results 5.1 Super Resolution This DCGAN inspired model was used for single frame image super-resolution (SR). It was seen that the proposed model achieves a slightly lesser PSNR than the conventional bicubic method. On careful examination, it was observed that the finer details are better reconstructed on the SR images generated by the proposed model, which is better suited for human recognition. Also, the proposed model is much more suited to image processing tasks as for traditional image processing techniques
38
P. Ghadekar et al.
Dataset
Latent Vector Space
Sample Image
Generator Model
Discriminator Model
Loss
Generated Sample
Fig. 2 Generative Adversarial Network Model
such as super-resolution, denoising, etc. need images with specific characteristics. Thus, these algorithms need to be fed various priors, which boost their performance; however, these are generally difficult to find and impossible to extract blindly. The DCGAN inspired model sufficiently minimizes the loss function uniformly, thus getting better results (Fig. 3; Table 1).
5.2 Denoising It has been observed that the PSNR obtained from NLM method is similar to the proposed model-based image denoising and median filtration methods are outperformed by both outlined methods. As can be seen that in the resulting image, NLM preserves fewer features than that of the DCGAN-based denoising algorithm (Fig. 4; Table 2).
5.3 Deconvolution Here, the image was preprocessed by using a Gaussian kernel and augmented with standard variance 0.002 white Gaussian noise. It was observed that the DCGAN inspired model-based deconvolution was outmatched by the Wiener filter and ADMM algorithm, which gave similar results concerning each other in terms of PSNR. On examining the outputs of the natural scene dataset, a large visual difference between original image and deblurred image is seen. A visible difference is seen between the original image and the deblurred model output on the natural scene dataset. The proposed model with a few changes shows that DCGAN can be easily generalized (Fig. 5; Table 3).
Unsupervised Image Generation and Manipulation …
39
Origin Bicubic LRes DCGAN Origin Bicubic LRes DCGAN
(a) Origin
LRes
(b) Bicubic
DCGAN (c)
Origin
Bicubic
LRes DCGAN (d)
Fig. 3 Super-resolution results a CelebA results, b natural scenes test result, c single-focused CelebA sample result, d single focused natural scenes
6 Conclusion In this paper, the proposed model inspired from DCGAN is used as a generic architecture is used to perform image processing tasks. This model gives competitive
40
P. Ghadekar et al.
Table 1 SR results in PSNR
PSNR Mean (dB)
STD (dB)
20.0743
2.1000
Celeb-A DCGAN
24.7463
1.4633
Celeb-A Face bicubic
26.1542
2.0485
MIT places bicubic
22.4309
3.2068
MIT places DCGAN
Fig. 4 MIT places dataset Image denoising outputs
PSNR
Original Noisy NLM Med-filter DCGAN
results and in some cases more visually appealing results for tasks such as superresolution and denoising, the DCGAN gives competitive results and in some cases gives better results that are more appealing compared to conventional methods. But achieves average results in tasks like deconvolution, where on the CelebA dataset, a good result was achieved but failed in case of complex inputs such as natural scenes.
Unsupervised Image Generation and Manipulation … Table 2 PSNR Image denoising result
PSNR
PSNR
Mean (dB)
STD (dB)
26.4248
0.8291
Celeb-A NLM
25.6422
0.7371
Celeb-A Median
24.8563
1.3845
MIT Places NLM
24.6524
1.5553
MIT Places median
20.9344
1.7745
MIT places DCGAN
22.4134
0.6423
Celeb-A DCGAN
Fig. 5 Image deconvolution results on the MIT places dataset
41
Original Blurry Wiener ADMM DCGAN
42
P. Ghadekar et al.
Table 3 Image deconvolution result PSNR
PSNR
PSNR
Mean (dB)
STD (dB)
Celeb-A Wiener
24.2986
1.9717
Celeb-A ADMM
23.1612
1.6210
Celeb-A DCGAN
19.6295
1.3390
Celeb-A Wiener
20.7702
1.5653
MIT Places ADMM
18.9301
1.2736
MIT Places DCGAN
17.4473
1.3457
References 1. J. Kim, J. Kwon Lee, K. Mu Lee. Deeply-recursive convolutional network for image superresolution, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 1637–1645 2. J. Kim, J. Kwon Lee, K. Mu Lee, Accurate image superresolution using very deep convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 1646–1654 3. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010) 4. C. Dong, C.C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016) 5. E. Denton, S. Chintala, A. Szlam, R. Fergus, Deep generative image models using a laplacian pyramid of adversarial networks (2015). arXiv preprint arXiv:1506.05751 6. K. Gregor, I. Danihelka, A. Graves, D. Wierstra, Draw: a recurrent neural network for image generation (2015). arXiv preprint arXiv:1502.04623 7. Y.A.H. Bastien, C. Martin-Vide, R. Mitkov, B. Truthe, in Deep Learning of Representations: Looking Forward, vol. 7978 (Springer, Berlin, 2013), pp. 1–37 8. Y. Bengio, Learning Deep Architectures for AI (Now Publishers, Delft, 2009) 9. Y. Bengio, G. Mesnil, Y. Dauphin, S. Rifai, Better Mixing via Deep Representation, vol. 28(1) (2013), pp. 552–560 10. Y. Bengio, E. Thibodeau-Laufer, J. Yosinski, Deep generative stochastic networks trainable by backprop, in Proceedings of the 31st International Conference on International Conference on Machine Learning (ICML’14), vol. 32 (2014), pp. II-226–II-234 11. I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks, in Advances in Neural Information Processing Systems, vol 27 (2014) 12. C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., in Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network (2016). arXiv preprint arXiv:1609.04802 13. Y. Bengio, Learning deep architectures of AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009) 14. G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006) 15. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778 16. K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in European Conference on Computer Vision (Springer, Berlin, 2016), pp. 630–645 17. O. Breuleux, Y. Bengio, P. Vincent, Quickly generating representative samples from an RBMderived process. Neural Comput. 23(8), 2053–2073 (2011)
Unsupervised Image Generation and Manipulation …
43
18. M. Hardt, B. Recht, Y. Singer, Train Faster, Generalize Better: Stability of Stochastic Gradient Descent (2015). arXiv preprint arXiv:1509.01240 19. J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, in JMLR (2012) 20. https://yann.lecun.com/exdb/mnist/ 21. https://github.com/sunsided/mnist 22. https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html 23. https://ufldl.stanford.edu/housenumbers/ 24. https://www.csail.mit.edu/research/places-database-scene-recognition
A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis and Machine Learning Debabrata Swain, Aneesh Khandelwal, Chaitanya Joshi, Abhijeet Gawas, Prateek Roy, and Vishwesh Zad
Abstract Sentiment Analysis or text mining is done to find the opinion of the users and to analyze their opinion. It is an approach to analyze data and retrieve sentiment from text. Social media has become a medium to express the feelings and emotions for its users as well as receive information from a diverse group of people. Online communication medium has become a new way to express negative thoughts like depression, anxiety, etc. that leads to the suicide ideation. In this paper, a sentiment analysis method for suicide ideation using tweets via supervised learning is proposed. This can be beneficial to society as a timely detection and alert system for detecting suicidal tendencies. The proposed method uses various python language modules and machine learning models for opinion mining. Keywords Sentiment analysis · Machine learning · Supervised learning · Suicide · Tweets
D. Swain (B) Computer Engineering Department, Pandit Deendayal Petroleum University, Gandhinagar, Gujarat, India e-mail: [email protected] A. Khandelwal · C. Joshi · A. Gawas · P. Roy · V. Zad IT and MCA Department, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] C. Joshi e-mail: [email protected] A. Gawas e-mail: [email protected] P. Roy e-mail: [email protected] V. Zad e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_5
45
46
D. Swain et al.
1 Introduction In today’s world, increasing suicides can be considered as a serious social issue in present-day mankind. Several components can result in suicide, for example, depression, severe anxiety, complex issues, hopelessness, negative thoughts, traumatic events, alcoholism etc. Thousands of people can suffer from suicide every year, making suicide avoidance and creating awareness are one of the leading social cause toward the welfare of people. Suicide is a critical general medical problem. World Health Organization (WHO) expresses that suicide is one of the three purposes of death worldwide [1]. Suicidal tendencies or suicidal thoughts are the people’s thoughts that are prone to commit suicide. It may be considered as an early indicator of suicide. Suicidal tendencies occur when a person is unable to find a direction and finds him to be trapped in a situation where suicide is the only solution to way out. Such a person may not express his feelings verbally but may share it through social media. Identifying the suicidal tendencies of a person can be considered as the first step in suicide prevention. Today, in twenty-first century, Social media is essential for communicating. Users express their feelings, emotions via tweets or posts. Social media has become an information resource for support rather than professional assistance such as psychiatrists, doctors, etc. People are referring forums, microblogging websites (such as twitter) for help rather than face someone directly, as indicated by research [2, 3]. The enormous proportion of data on an individual’s emotions can be utilized as a timely warning of behavioral switching of individuals who are exposed to harm and danger or may aid to counter suicides. Analyzing the sentiment from these tweets can be very useful for identifying suicidal tendencies. The real-time social media data allow potential early detection. Sentiment analysis is part of a natural language processing that performs text analysis and identifies an emotion from the text. Machine learning is an implementation of artificial intelligence (AI) that allows programs. The opportunity to automatically learn and build on knowledge without being specifically programmed. Machine-learning centers on computer programs being created that can access data and use it to teach for themselves. At present times, recent advancement in the field of machine learning has played an important role in solving various problems in automating things related to natural language processing (NLP), health care, sentimental analyzing and many more[4].
1.1 Goal This paper suggests a suicidal ideation prevention paradigm for the study of microblogs. Present suicide prevention programs are highly labor intensive and inefficient as they depend heavily on basic keyword matches and questionnaires [5]. Doctors suggest “early identification is the strongest treatment” so that, with the aid
A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis …
47
of the Natural Language Processing (NLP) methodology and text processing tools, and effective suicidal propensity prediction method is suggested to test the latent emotions of people. This proposed system will help to identify the suicidal users based on their tweets. It will help doctors and psychologists in the treatment of their patients. The various machine learning algorithms for binary classification are used here like Logistic Regression, Random Forest, Support Vector Machine, and Naive Bayes for building models. Each model will predict the given input as positive, i.e., Suicidal, Negative, or Neutral. In particular, the generation of the score of a particular word will be done based on a data dictionary, which is proposed after performing data preprocessing. In the end, the conclusion is given which algorithm is best suited for the sentiment analysis based on the accuracy of that algorithm in terms of percentage.
2 Related Works Suicide detection and ideation have garnered more attention by psychologists in the late twentieth century. The reasons for suicide are complicated and consist of many factors [6]. It is estimated that due to the diverse causes, over eight lakh individuals expire while endeavoring’s suicide [7]. The majority of the people do not strategize or plan their suicide. According to the CDC-2015, researchers have to study and recognizing suicidal patterns and tendencies instead of that suicide rate is increasing annually [8]. Suicide is acknowledged as a disease and according to the report of WHO (World Health Organization), 17% of residents of the global suicide sufferers belong to India [7]. Due to these reasons mentioned above, it is very much important to study suicidal tendencies because early detection is the best cure. Some of the earlier work in the area of text analysis used Bag of Word (BOW) model in which, the text which is to be checked is verified against a particular set of words and identified as positive, neutral, or negative based on that set of words. Machine learning is commonly introduced, its capabilities include N-gram features, knowledge-based features, syntactic features, background features, and class-specific features [9]. In N-grams analysis, sentiment analysis is done at various levels like word level, sentence level, and document level. Various analysis methods are also done like Tokenization analysis, Word frequency analysis, and polarity/subjectivity analysis, with the help of python text blob library [10]. Machine Learning and Natural Language Processing (NLP) methods are applied mostly by researchers. Another researcher has proposed two-step approaches; in the first Descriptive and Statistic Approach, the method analyses statistical patterns for detection. The second one is a Predictive Approach: In this technique, the information will be utilized to produce a model to foresee future reasons for suicide by using the data present in the current information [11].
48
D. Swain et al.
Research is conducted by a researcher to knock down the network and communications of a self-destructive client in which, analysis is performed on the map characteristics of several Twitter clients that have shared the content that human annotators believe should be called providing proof of self-destructive logic. For the examination, they alluded to these clients as “suicidal users.” To discover chart qualities, N-grams investigation is finished [12]. Semantic analysis of tweets is often suggested by researchers who use the “Weka” machine learning platform to automatically classify the tweets. Further on that classified result researchers applied Semantic analysis using “WordNet” and obtained intended results. In this analysis, the Weka device, which is one of the information mining tools and supports calculations that rely on the AI for study, was used to isolate the related details from the information or data generated by the Twitter platform [13]. O’Dea et al. have established a structured suicide identification on Twitter by applying logistic regression and SVM to TFIDF highlights [14]. Huang et al. introduced a psychological lexicon dictionary and used an SVM classifier to differentiate between self-destructive intentions [5]. Chattopadhyay et al. proposed a research approach utilizing Beck’s suicidal prediction scale and using a multilayer neural feed-forward framework to differentiate between self-destructive propensities [15]. However, hardly any investigations have been finished mining the hazard factors from web-based life. Creators in [16] followed self-destructive elements from Twitter utilizing catchphrases based methodologies. Some ongoing attempts are performed by specialists to recognize tweets by levels of concerns [14, 17], trouble [18], or kinds of self-destructive writings [19] utilizing AI or rule-based techniques.
3 Research Methodology In this section, it is shown how an experiment is done, which includes data collection and dataset creation, data preprocessing, Feature extraction, building a machine learning model, and finally testing of that model.
3.1 Defining a Data and Dataset Creation Positive tweets, i.e., tweets containing negative thoughts that may lead to suicidal tendencies are gathered from [20] “Github” repository. Negative tweets, i.e., any normal tweet which does not contain any negative thought are gathered from [21] “Kaggle” Repository. By merging these two datasets one balanced dataset is formed consisting of 10,000 tweets, i.e., 5000 positive tweets and 5000 negative tweets (Fig. 1).
A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis …
49
Fig. 1 Proposed System architecture
3.2 Data Preprocessing Data preprocessing is the most essential process to obtain intended performance and accurate results. In which redundancy is removed from data, cleaning of noisy data is done. Real-time data are often considered as raw data that have missing values and incomplete fields, these fields may lead to unwanted and poor results so, data preprocessing is considered an important process.
50
D. Swain et al.
Preprocessing of data consists of various steps to obtain consistent data which will lead to great results; the steps of preprocessing are given below: Due to algorithms of supervised learning are proposed in this research, the labeling of a dataset is a foremost step. Positive tweets labeled as 1 and negative tweets labeled as 0. Further, the URL and punctuation marks are removed from the dataset. After basic cleaning tokenization is done, in which data are chopped into pieces called tokens. After that Stemming and Lemmatization are done using “Porter Stemmer” and “WordNetLemmatizier” in which, a suffix is removed from a word to obtain a root/base word, i.e., called as “Lemma.” Unique word identification is done after the stemming and lemmatization. Unique Words found after Stemming: 17,863 and Unique Words found after Lemmatization: 22,399. Then stop words are removed from unique words. Unique Words After removal of Stop Words: 17,752. Then, the words which had fewer frequencies removed from the dataset and the final 3293 unique words have identified from a dataset.
3.3 Model Training Training a model implies that studying (determining) is sensible to all weights and even to the biases of labeled cases. In supervised learning, the machine learning algorithm constructs a model by analyzing multiple instances and searching for a solution that minimizes loss; this approach is called empiric risk minimization.
3.4 Model Testing Testing model Performance involves the testing of models with test knowledge/new data sets and the comparing of model results in terms of criteria such as accuracy/recall etc. against predetermined accuracy with the model already developed and enraptured in development. This is important that no observations are taken from the training area unit contained inside the testing set. If the test set includes examples from the training set, it would be challenging to determine whether or not the algorithmic system has learned to generalize from the training set or has simply memorized it.
3.5 Feature Extraction Feature extraction is a technique that transforms arbitrary data in the form of text and images to the numerical features. For feature extraction two methods were followed namely, Counter Vectorizer (CV) and Term Frequency Inverse Document Frequency
A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis …
51
(TFIDF). The CV is simple compared with the TFIDF, which converts the collection of text documents into the form of a matrix of token counts. TFIDF is a comparatively more advanced method that removes the words with low importance like “a”, “an,” “the” from a text corpus, and yields great results than CV. In TFIDF the TF tends to term frequency and IDF tends to inverse document frequency. TF simple instance count of words and IDF means how common or rare the word is in the entire document. By doing CV and TFIDF, the extraction of Features is done f id f (d, t, D) = t f (t, d).id f (t, D)
(1)
where t f (t, d) = log(1 + freq(t, d) N id f (t, D) = count(d ∈ D : t ∈ d t word d doccument D doccument set
4 Data Analysis and Results As already mentioned, after the cleaning and preprocessing of data four different algorithms of supervised machine learning are applied, Logistic Regression and Random Forest Classifier are used with CountVectorizer (CV) and SVM and Naive Bayes are used with TFIDFVectorizer. A dataset is built using over 10,000 tweets, it is divided into 2 parts training set and testing set. 80% tweets are used for the training of the models and 20% tweets are used to test the models. This is accompanied by the Pareto law, which says that, in most instances, 80% of the results arise from 20% of the sources. Comparison is done based on the accuracy of each model, i.e., which model is best suited for the prediction of suicidal tendencies. The different algorithms used are: (1) Logistic Regression (LR): Logistic regression is a managed learning order calculation used to foresee the likelihood of an objective variable. The idea of the objective or ward variable is dichotomous, which implies there would be just two potential classes. In basic words, the reliant variable is twofold in nature having information coded as either 1 (represents achievement/yes) or 0 (represents disappointment/no). (2) Random Forest (RF): Random forest may be a supervised learning algorithm that is used for both classifications also as regression. However, it is mainly used for classification problems. As we all know that a forest is formed of
52
D. Swain et al.
trees and more trees mean more robust forests. Similarly, a random forest algorithm creates decision trees on data samples so gets the prediction from each of them and eventually selects the most effective solution through voting. It is an ensemble method that is better than one decision tree because it reduces the over-fitting by averaging the result. (3) Support Vector Machines (SVM): The SVM model is a mirrored image of varied groups in an exceedingly multidimensional space hyperplane. The hyperplane is iteratively created by SVM so the error may be minimized. SVM aims to separate the datasets into groups to search out the maximal marginal hyperplane (MMH). 4) Naïve Bayes: The Naïve Bayes algorithm is a classification methodology focused on the interpretation of Bayes’ theorem, with the clear presumption that all predictors are independent of each other. Simply stated, the presumption is that the existence of a function in a class is independent of the appearance of some other function in the same class. The following table and charts present the results of each model in terms of precision (Tables 1, 2 and 3): Table 1 Tweets without Risk (Labeled as 0) Algorithms
LR
RF
SVM
NB
Precision
0.95
0.97
0.98
0.98
Recall
0.97
0.94
0.95
0.77
F1-Score
0.96
0.96
0.96
0.86
Table 2 Tweets with Risk (Labeled as 1) Algorithms
LR
RF
SVM
NB
Precision
0.97
0.94
0.95
0.82
Recall
0.95
0.97
0.99
0.98
F1-Score
0.96
0.96
0.97
0.89
Table 3 Accuracy table
Algorithm
Accuracy (%)
Logistic regression
96.23
Random forest
95.93
Support vector machine
96.63
Naive Bayes
88.1
A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis …
53
4.1 Common Terminologies Accuracy: Accuracy is the portion of forecasts our model got right. Officially, Accuracy has the accompanying definition: Accuracy is calculated with the following formula: ACCURACY =
(True Positives + True Negatives) (True Positives + True Negatives + False Positives + False Negatives)
Precision: The part of the tumors that were anticipated to be harmful that are dangerous. Precision is determined with the accompanying equation: PREC = True Positives/(True Positives + False Positive)
(3)
Recall: The portion of dangerous tumors that the system distinguished. Recall is determined with the accompanying equation: Recall = True Positive/(True Positive + False Negative)
(4)
F1 score: F1 is a general proportion of a model’s exactness that consolidates precision and recall, in that unusual way that addition and multiplication: F1 = 2 ∗ [(Precision ∗ Recall)/(Precision + Recall)] where True Positive—A true positive is a result where the model effectively predicts the positive class. True Negative—A True negative is a result where the model effectively predicts the negative class. False Positive—A False positive is a result where the model inaccurately predicts the positive class. False Negative—A False-negative is a result where the model erroneously predicts the negative class. Of each classifier, the orange color represents negative tendencies and blue color represents positive tendencies (Fig. 2). After Figuring out all 4 algorithms, Support vector machines (SVM) gives the most accuracy (96.63), because Term Frequency Inverse Document Frequency (TFIDF) Vectorizer is more efficient than CountVectorizer (CV) in this particular scenario [4].
54
D. Swain et al.
Fig. 2 Performance of different classifiers
4.2 Accuracy of All Algorithms See Fig. 3. (1) Logistic Regression Positive tweets Measure
60–40
70–30
80–20
90–10
Precision
0.976008
0.974167
0.969398
0.969697
Recall
0.946535
0.949636
0.958984
0.965795
F1-Score
0.961045
0.961745
0.964163
0.967742
A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis …
Fig. 3 Accuracy of all 4 algorithms bar plot
Negative tweets Measure
60–40
70–30
80–20
90–10
Precision
0.947085
0.950294
0.957447
0.966337
Recall
0.976263
0.974514
0.968238
0.970179
F1-Score
0.961452
0.962252
0.962812
0.968254
Accuracy Measure
60–40
70–30
80–20
90–10
Accuracy
0.96125
0.962
0.9635
0.968
(2) Random Forest Positive Tweet Measure
60–40
70–30
80–20
90–10
Precision
0.942761
0.945652
0.946328
0.942085
Recall
0.970297
0.980119
0.981445
0.981891
F1-Score
0.956331
0.962577
0.963567
0.961576
55
56
D. Swain et al.
Negative Tweet Measure
60–40
70–30
80–20
90–10
Precision
0.968766
0.979109
0.979744
0.981328
Recall
0.939899
0.942991
0.941598
0.940358
F1-Score
0.954114
0.962577
0.960293
0.960406
Accuracy Measure
60–40
70–30
80–20
90–10
Accuracy
0.95525
0.96166
0.962
0.961
(3) Support Vector Machine Positive Tweet Measure
60–40
70–30
80–20
90–10
Precision
0.942085
0.942085
0.92085
0.942085
Recall
0.981891
0.981891
0.981891
0.981891
F1-Score
0.961576
0.961576
0.961576
0.961576
Measure
60–40
70–30
80–20
90–10
Precision
0.981328
0.981328
0.981328
0.981328
Recall
0.940358
0.940358
0.940358
0.940358
F1-Score
0.960406
0.960406
0.960406
0.960406
Negative Tweet
Accuracy Measure
60–40
70–30
80–20
90–10
Accuracy
0.9632
0.9663
0.9645
0.963
(4) Naïve Bayes Classifier Positive Tweet Measure
60–40
70–30
80–20
90–10
Precision
0.808068
0.820887
0.823245
0.815748
Recall
0.986567
0.983150
0.980769
0.979206
F1-Score
0.888441
0.894721
0.895129
0.890034
A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis …
57
Negative Tweet Measure
60–40
70–30
80–20
90–10
Precision
0.982536
0.977431
0.973719
0.969863
Recall
0.763317
0.772821
0.771875
0.751592
F1-Score
0.859163
0.863166
0.861127
0.846890
Accuracy Measure
60–40
70–30
80–20
90–10
Accuracy
0.8755
0.881
0.8805
0.872
5 Conclusion and Future Work The strategy depends on different machine learning algorithms for utilizing Twitter as an evasion device for early recognition of self-destructive inclinations. Plus, this exploration can break down Twitter information dependent on WordNet semantically. The future work is suggested that, arranging of further improvement and refinement of strategies to upgrade the productivity of the proposed strategy. From that point, the contribution of multilingual WordNet for tweets and to actualize this examination in a greater information condition is proposed.
References 1. Befrienders.org, Suicide Statistics|Befrienders (2018). [Online]. Available: https://www.befrie nders.org/suicide-statistics 2. M. De Choudhury, M. Gamon, S. Counts, E. Horvitz, Predicting depression via social media, in Seventh international AAAI Conference on Weblogs and Social Media (2013) 3. M.A. Moreno, L.A. Jelenchick, K.G. Egan, E. Cox, H. Young, K.E. Gannon, T. Becker, Feeling bad on Facebook: depression disclosures by college students on a social networking site. Depression Anxiety 28, 447–455 (2011) 4. D. Swain, S. Pani, D. Swain, Diagnosis of coronary artery disease using 1-D convolutional neural network. Int. J. Recent Technol. Eng. (IJRTE), vol. 8 (2019) 5. X. Huang, L. Zhang, D. Chiu, T. Liu, X. Li, T. Zhu, Detecting suicidal ideation in Chinese microblogs with psychological lexicons, in 2014 IEEE 11th International Conference on Ubiquitous Intelligence and Computing and 2014 IEEE 11th International Conference on Autonomic and Trusted Computing and 2014 IEEE 14th International Conference on Scalable Computing and Communications and Its Associated Workshops (2014) 6. R.C. O’Connor, M.K. Nock, The psychology of suicidal behaviour. Lancet Psychiatry 1, 73–85 (2014) 7. M.C. Podlogar, A.R. Gai, M. Schneider, C.R. Hagan, T.E. Joiner, Advancing the prediction and prevention of murder-suicide. J. Aggression Conflict Peace Res. (2018)
58
D. Swain et al.
8. J.D. Ribeiro, J.C. Franklin, K.R. Fox, K.H. Bentley, E.M. Kleiman, B.P. Chang, M.K. Nock, Self-injurious thoughts and behaviors as risk factors for future suicide ideation, attempts, and death: a meta-analysis of longitudinal studies. Psychol. Med. 46, 225–236 (2016) 9. W. Wang, L. Chen, M. Tan, S. Wang, A.P. Sheth, Discovering fine-grained sentiment in suicide notes. Biomed. Inform. Insights 5, BII–S8963 (2012) 10. S. Madhu, An approach to analyze suicidal tendency in blogs and tweets using Sentiment Analysis. Int. J. Sci. Res. Comput. Sci. Eng. 6, 34–36 (2018) 11. I. Amin, S. Syed, Prediction of suicide causes in India using machine learning. J. Independent Stud. Res. (JISR) 15 (2017) 12. G.B. Colombo, P. Burnap, A. Hodorog, J. Scourfield, Analysing the connectivity and communication of suicidal users on twitter. Comput. Commun. 73, 291–300 (2016) 13. M. Birjali, A. Beni-Hssane, M. Erritali, Machine learning and semantic sentiment analysis based algorithms for suicide sentiment prediction in social networks. Procedia Comput. Sci. 113, 65–72 (2017) 14. B. O’dea, S. Wan, P.J. Batterham, A.L. Calear, C. Paris, H. Christensen, Detecting suicidality on Twitter. Internet Intervent. 2, 183–188 (2015) 15. S. Chattopadhyay, A mathematical model of suicidal-intentestimation in adults. Am. J. Biomed. Eng. 2, 251–262 (2012) 16. J. Jashinsky, S.H. Burton, C.L. Hanson, J. West, C. Giraud-Carrier, M.D. Barnes, T. Argyle, Tracking suicide risk factors through Twitter in the US. Crisis (2014) 17. A. Abboute, Y. Boudjeriou, G. Entringer, J. Azé, S. Bringay, P. Poncelet, Mining twitter for suicide prevention, in International Conference on Applications of Natural Language to Data Bases/Information Systems (2014) 18. C. Homan, R. Johar, T. Liu, M. Lytle, V. Silenzio, C.O. Alm, Toward macro-insights for suicide prevention: Analyzing fine-grained distress at scale, in Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality (2014) 19. P. Burnap, G. Colombo, R. Amery, A. Hodorog, J. Scourfield, Multi-class machine classification of suicide-related communication on Twitter. Online Social Networks Media 2, 32–44 (2017) 20. N. C. The Institution of Engineers, Twitter Suicidal Analysis, [Online]. Available: https://git hub.com/IE-NITK/TwitterSuicidalAnalysis 21. M.M. Kaz Anova, Sentiment140 dataset with 1.6 million tweets. [Online]. Available: https:// www.kaggle.com/kazanova/sentiment140 22. D. Swain, S.K. Pani, D. Swain, A metaphoric investigation on prediction of heart disease using machine learning, in 2018 International Conference on Advanced Computation and Telecommunication (ICACAT) (2018)
Video Categorization Based on Sentiment Analysis of YouTube Comments Debabrata Swain, Monika Verma, Sayali Phadke, Shraddha Mantri, and Anirudha Kulkarni
Abstract With recent development in digital technologies, the amount of multimedia statistics is increasing everyday. Abusive video constitutes a hazard to public safety and thus constructive detection algorithms are in urgent need. In order to improve the detection accuracy here, Sentiment analysis-based video classification is proposed. Sentiment analysis-based video classification system is used to classify video content into two different categories, i.e., Abusive videos, nonabusive videos. We are using YouTube comments of a video as source of input, which is analyzed by our sentiment analysis model and the model determines the category to which that particular video belongs. Many techniques such as Bag of Words, Lemmatization, logistic regression and NLP are used. The proposed scheme obtains competitive results on abusive content detection. The empirical outcome shows that our method is elementary and productive. Keywords Abusive video · Non-abusive video · NLP · Logistic regression · Normalization · Bag of words · Data preprocessing
D. Swain Computer Engineering Department, Pandit Deendayal Petroleum University, Gandhinagar, Gujarat, India e-mail: [email protected] M. Verma (B) · S. Phadke · S. Mantri · A. Kulkarni IT&MCA, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] S. Phadke e-mail: [email protected] S. Mantri e-mail: [email protected] A. Kulkarni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_6
59
60
D. Swain et al.
1 Introduction Video content makes up more and more proportion of the world’s Internet traffics at present. Video service represented by short video and live streams becomes the new trends of the development of the Internet. However, Internet video content is filled with some abusive videos, which seriously harm the construction of the network ecology and the mindset of the growing kids who are watching such videos. Visual information plays essential role in human life for daily world communications. Social media tools allow people to create, share or exchange information. According to a recent survey, internet users are now spending an average of 2 h and 22 min per day on social networking and messaging platforms [1]. Furthermore, examine sudden violence in time creates tremendous challenges for video scrutiny. Thus, violent and abusive video detection is of vital importance. Abusive video detection generally refers to the detection of violence and abusive scene in video. By using sentiments analysis, we have achieved this. So, here we have proposed a system that classifies the abusive and nonabusive categories by performing sentiment analysis on the comments or reviews given on a video. Sentiment analysis is the study of textual data to predict the emotion behind that data that whether the user is showing positive emotion, negative emotion or neutral. Using this concept, we have developed a system, which classifies the data into three categories, i.e., positive, negative and neutral. For data, we are taking YouTube comments and the sentiment analysis model will classify the comments and help us to categorize the video into the given categories. So with the results obtained from the sentiment shown by the users in the comment sections, we can differentiate the abusive videos (negative) from nonabusive (positive and neutral). Our system can utilize the model to differentiate the videos into categories on the basis of the user’s age, if the user is above 18, he will have full access to all kinds of videos (abusive and nonabusive) but if the user is below 18, the system will only allow access the nonabusive category of the videos. Here, we have created dataset especially for abusive activities’ detection, to distinguish violent/abusive incidents from normal occurrence [2]. The major concern arises in violence detection is from the fact that it is very difficult to interpret violence in objective terms. So, different approaches have been adopted for this problem. Here our approach is focused on text classification. Text classification is an essential component in many applications, such as sentiment analysis, language detection, and in our research domain of interest, abusive text detection. By using Natural Language Processing (NLP), text classification technique process on data from data set and categorized text into organized groups. One of the foundational functions in text classification is feature representation, which is done by using bag of words method. In bag of word approach, we are creating a dictionary of known words and the number if occurrences of those words which will be fed to the sentiment analysis model. The overall purpose of text classification is to train model that could be used for classifying videos related data in three categories. Here, once the classification is done, model is trained by using logistic regression to obtain maximum accuracy. In order to show the maximum contribution of this approach,
Video Categorization Based on Sentiment Analysis …
61
we have performed a series of experiments with the other machine learning models as well (i.e., Naive Bayes’ algorithm, Random Forest, support vector machine). The paper is organized as follows. Section 2 presents the related work. Section 3 presents the proposed method. Section 4 contains experimental reports of the performance of framework. Finally, Sect. 5 concludes the paper and discusses possible future work.
2 Related Work In this section, related works for violent video detection is analyzed based on traditional methods which are multimodal classification strategy. The two important methods are “based on audio features” and “multimodal audio-visual features.” Methods based on audio features: Pfeiffer et al. [3] proposed an algorithm for automatic audio content analysis for violence detection using audio tracking. Cheng et al. [4] proposed a hierarchical approach based on Gaussian mixture models to detect. audio of semantic context and [5] Hidden Markov Model (HMM) for speech recognition event. Giannakopoulos et al. [6] used six segment-level audio features for violence context classification with support vector machine classifier. Clarin et al. [7] designed an automated system, which consists four modules, and this method mainly used Kohonen’s Self-Organizing Map to recognize skin and blood colors and motion intensity analysis to detect violent actions involving blood. Methods based on multimodal audio–visual features: Nam et al. [8] introduced a characterization scheme to recognize violent content of movies using audio–visual features, and this is one of the first proposals for violence recognition in video. Gong et al. [9] explained a three-stage method to detect violent scenes in movies, amalgamating low-level visual cues and auditory features and high-level audio effects related to violence. Lin and Wang [10] presented violent shot detection scheme, using a weakly supervised audio classifier and cotraining method with a motion, explosion and blood video classifier. Giannakopoulos et al. [11] proposed a violence detecting methodology in movies that combined audio features as well as video features using a k-Nearest Neighbor classifier. Most of the above-mentioned research pays attention on detecting violent content based on audio features and bloody color features. These features are very useful in the detection of violent content in movies. Although, in the real world of video monitoring, audio and bloody scenes are rarely recorded. Therefore, later most of the researches put attention on visual features. Recently, Xu et al. [12] introduce motion SIFT (moSIFT) feature and sparse coding method over bag of words to exceed the state-of-the-art techniques for violence detection in both crowded and noncrowded scenes. Datta et al. [13] used motion trajectory and orientation statistics of a person’s limbs to detect human violence in Video, such as first smacking, fighting, kicking, hitting with objects, etc. Zhang et al. [14] proposed robust violence detection (RVD) method in surveillance scenes and got worthwhile results on certain benchmark
62
D. Swain et al.
datasets in detection accuracy and processing speed, including crowded scenes as well. Zhang et al. [15] proposed a robust motion image descriptor for violence detection which is Motion Weber Local Descriptor (MoWLD), combining the sparse coding method and the experimental results demonstrated that their commended method is effective. Later, with the revolution of deep learning, related research in video analysis domain grew, especially in human activity/motion recognition. Despite that violence video detection related work hardly gets published. But due to the importance of realistic security and the large development in deep learning methods have made on visual/video recognition researchers and developers take on deep learning methods to detect violence in video. So, some research in neural network regarding violence detection is as follows: Song et al. [16] proposed 3D ConvNet and keyframe extraction algorithm for detecting novel violent video to reduce redundancy and decrease the destruction of motion integrity. Mumtaz et al. [2] used deep CNN model for violence detection in deep network using transfer learning on surveillance videos. Mondal et al. [17] worked on descriptor in deep neural network to classify video. Chen et al. [18] purposed real-time detection on abusive user posts using labeled dataset and to detect automatic abusive video using supervised machine learning techniques. Although the above-mentioned traditional methods and deep learning methods showed excellent results on violent video detection, but they are not that discriminatory. For this reason, these papers represent logistic regression-based violence detection model. The significant contribution of this paper is adopting a new experimental and effective method, preventing secular information to a degree.
3 Proposed Method The proposed method is logistic regression algorithm, which comes under machine learning concept. In order to detect the video type first data has to go through data preprocessing and feature extraction techniques, and then implementation of algorithm is done. The overall flowchart of the proposed methodology is given in Fig. 1 and the steps are detailed in the following sections. Data Set Description The dataset consists of around 17 thousand entries, which are collected from various GitHub repositories. Initially, the data file that was in text format is converted into csv format using python’s Pandas library. The column for all the comments is labeled as “Comments.” Then another column for recording sentiments is created and labeled as “Sentiments.” To recording sentiments for each entry in column “Comments,” the Text Blob library is used and the polarity is calculated, which is the sentiment score of each comment. So, the polarity is returned in a floating-point number. So, it is converted into integer number. If the polarity of the comment is greater than 0 then the sentiment score is labeled as 1 indicating that it a positive comment in the corresponding “Sentiment” column. If the polarity is 0 then sentiment is labeled as
Video Categorization Based on Sentiment Analysis …
63
Fig. 1 The model and stage
0 indicating neutral comment and if it is less than 0 then −1 is assigned indicating negative comment. Figure 2 shows the distribution of positive, negative and neutral entries of comments in the dataset. Data Preprocessing and Feature Extraction Using Bag of Word Model Data preprocessing or a data mining technique is a process of transforming raw data, which is nothing but real-world data into an understandable format. Raw data are always incomplete so that data cannot be sent through a model otherwise it causes certain errors. Hence, data get processed before send through model. In data preprocessing, first phase is to collect a data and make sure that the data collected must be of high quality as it will directly affect the quality of mined pattern [19]. Here, dataset of YouTube comment is collected. Now collected data and libraries are imported so that process of data cleaning begins. Data cleaning is done in several steps. Initially, comments present in dataset get analyzed and all the noncharacter
Fig. 2 Data set values
64
D. Swain et al.
data are removed. Having noncharacter entries will give invalid input to the model. Now stop words are removed. Here, stop words are referred as the word, which do not have any meaning or useless words. Commonly used words such as “the,” “a,” “an,” “in” are some stop words that a search engine is trained to ignore. Well stop words are removed but still words in the sentence are not tokenized. Tokenization is the process of fragmenting a sentence into pieces such as words, keywords and other elements called tokens. Due to the grammatical reasons, even after tokenization documents can contain different forms of a word, related words or a word with similar meaning. To avoid the recurrence of words text lemmatization and stemming is done. The main purposed of using stemming and lemmatization is to reduce inflectional forms and derivationally related form of word to common base form. Even if Lemmatization and stemming are different, they are special case of Normalization. Here, the output of normalization will provide a list of unique words. After this process, it will be easy to understand that which model will provide maximum accuracy because lesser the unique words higher the accuracy. TO increase accuracy, fewer frequency words are removed from normalized list as well. Now at this point, preprocessing and tokenization step is about to complete as feature extraction process begins. Feature extraction is a process of converting the text into vectors of numbers. ML algorithms do not understand the text therefore in feature extraction the set of words is converted into numeric form, specifically, vectors of numbers. There are few ways to do this process and here Bag of words method is used. Bag of words is most popular and simple feature extraction technique in which information about the order or structure of words is discarded. Basically, what is happening here is, taking input of X-axis and Y-axis and then predicting Y for new X data. Now TF_IDF is measured by multiplying matrix value to evaluate the importance of a word. TF_IDF is statistical measure that counts the appearance of word in order of rank. Model Building By this time, the feature list is created and the process of creating X-axis and Y-axis inputs for model takes place so, when X and Y inputs are ready, the data will get spilt into two sections, i.e., training data and testing data to better understand the model performance. Here in this method, data are split into 80–20 sections. Next step is to import logistic regression model because abusive content detection problem is based on text classification. In Machine Learning, Logistic Regression classification algorithm is used to predict the probability of a categorical dependent variable. In this, the dependent variable is a binary variable, which gives binomial outcomes (y = 0 or 1). Logistic regression is an important ML algorithm as it has the ability to provide probabilities and classify new data using continuous and discrete data sets. The sigmoid function gives an “S” shaped curve, which predicts two maximum values (0 or 1). The S shaped curve indicates like hood of something such as video has violent content or not (Fig. 3). As mentioned, the train data set of X and Y is passed for the model to train. Next, the testing datasets of features are passed into the model to get the predicted sentiment (y_pred). Here logistic regression model for abusive video detection based on text classification is providing outcome with 89% accuracy.
Video Categorization Based on Sentiment Analysis …
65
Fig. 3 Logistic Regression formula
4 Experiments A number of machine learning classifiers were experimented to come up with the final model. The Naïve Bayes classifier, Random Forest classifier and Support vector Machine classifier were scrutinized properly to achieve the optimum accuracy. • For Naïve Bayes classifier, the dataset is split into train and test datasets in a proportion of 80:20 and achieved an accuracy of 82.71%. Then the train_test_split was changed to a proportion of 90–10 and the accuracy increased to 83.35%. • For Support Vector Classifier, with the proportion of 80–20, the dataset is split and obtained an accuracy of 85.54% (by using “linear” kernel). • For Random-Forest classifier achieved an accuracy of only 63% at split of 80–20 proportion. At last, the Logistic Regression Classifier is adopted with an accuracy of 89.15% at a split of 95–5 (95% is in training and 5% in testing). Initially, it gave an accuracy of 85.63% with a split of 80–20. It kept increasing as the size of training set kept increasing and finally got the mentioned accuracy (Fig. 4).
Accuracy Comparision 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
LogisƟc Regression
Fig. 4 Comparison curve
Random forest
Support Vector Machine
Naïve Bayes
66
D. Swain et al.
5 Conclusion In this paper, the proposed system is an effective abusive video detection approach based on the logistic regression and genre-based video classification concept. Performance comparisons with the other available schemes further demonstrate the effectiveness of the proposed approach. We analyzed a sample of near about 17 thousand comments. Large-scale research of YouTube video Metadata using the NLP showed the importance of user sentiments. The experimental result provides the efficiency of the proposed approach by revealing maximum 89% accuracy in order to retrieve worth-while video. The proposed method exceeds the state-of-the-art techniques for violence detection in both crowded and noncrowded scenes. It exhibits the efficacy of the proposed video feature extraction framework, and whether this video feature can maintain potency in other video analysis tasks which is worthy of further research.
References 1. S. Salim, How much time do you spend on social media? Research says 142 minutes per day (2019). Retrieved from https://www.digitalinformationworld.com/2019/01/how-much-timedo-people-spend-social-media-infographic.html 2. A. Mumtaz, A.B. Sargano, Z. Habib, Violence detection in surveillance videos with deep network using transfer learning, in 2018 2nd European Conference on Electrical Engineering and Computer Science (EECS) (2018) 3. S. Pfeiffer, S. Fischer, W. Effelsberg, Automatic audio content analysis, in Proceedings of 4th ACM International Conference on Multimedia (1996), pp. 21–30 4. W.-H. Cheng, W.-T. Chu, J.-L. Wu, Semantic context detection based on hierarchical audio models, in Proceedings of 5th ACM SIGMM International Workshop on Multimedia Information Retrieval (2003), pp. 109–115 5. L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2):257–286 (1989) 6. T. Giannakopoulos, D. Kosmopoulos, A. Aristidou, S. Theodoridis, Violence content classification using audio features, in Proceedings of HellenicConf. Artif. Intell., Berlin, Germany, 2006, pp. 502_507. 7. C. Clarin, J. Dionisio, M. Echavez, P. Naval, DOVE: detection of movie violence using motion intensity analysis on skin and blood, in Proceedings of PCSC, vol. 6 (2005), pp. 150–156 8. J. Nam, M. Alghoniemy, and A. H. Tew_k, “Audio-visual content-based violent scene characterization,” in Proc. Int. Conf. Image Process., vol. 1,Oct. 1998, pp. 353_357. 9. Y. Gong, W. Wang, S. Jiang, Q. Huang, W. Gao, Detecting violent scenes in movies by auditory and visual cues, in Proceedings of Pacific-Rim Conference on Multimedia, Berlin, Germany (2008), pp. 317–326 10. J. Lin, W. Wang, Weakly-supervised violence detection in movies with audio and video based co-training, in Proceeding of Pacific-Rim Conference on Multimedia, Berlin, Germany (2009), pp. 930–935 11. T. Giannakopoulos, A. Makris, D. Kosmopoulos, S. Perantonis, S. Theodoridis, Audio-visual fusion for detecting violent scenes in videos, in Proceedings of Hellenic Conference on Artificial Intelligence (2010), pp. 91–100 12. L. Xu, C. Gong, J. Yang, Q. Wu, L. Yao, Violent video detection based on MoSIFTfeatureandsparsecoding, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)
Video Categorization Based on Sentiment Analysis …
67
13. A. Datta, M. Shah, N. Da Vitoria Lobo, Person-on-person violence detection in video data, in Proceedings of 16th IEEE International Conference on Pattern Recognition, vol. 1 (2002), pp. 433–438 14. T. Zhang, Z. Yang, W. Jia, B. Yang, J. Yang, X. He, A new method for violence detection in surveillance scenes. Multimedia Tools Appl. 75(12), 7327–7349 (2016) 15. T. Zhang, W. Jia, B. Yang, J. Yang, X. He, Z. Zheng, MoWLD: a robust motion image descriptor for violence detection. Multimedia Tools Appl. 76(1), 1419–1438 (2017) 16. W. Song, D. Zhang, X. Zhao, J. Yu, R. Zheng, A. Wang, A novel violent video detection scheme based on modified 3D convolutional neural networks. IEEE Access 7, 39172–39179 (2019) 17. S.Mondal, S.Pal, S.K. Saha, B. Chanda, Violent/Non-violent video classification based on deep neural network, in 2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR) (2017) 18. H. Chen, S. Mckeever, S.J. Delany, Presenting a labelled dataset for real-time detection of abusive user posts, in Proceedings of the International Conference on Web Intelligence. ACM (2017), pp. 884–890 19. Dwivedi, S.K., Rawat, B, A review paper on data preprocessing: a critical phase in web usage mining process. in 2015 International Conference on Green Computing and Internet of Things (ICGCIoT) (2015). https://doi.org/10.1109/icgciot.2015.7380517
Credit Score Prediction Using Machine Learning Debabrata Swain, Raunak Agrawal, Ayush Chandak, Vedant Lapshetwar, Naman Chandak, and Ashish Vaswani
Abstract A strong financial and economic status of a country is necessary for the well-being of all the citizens. A big part of the financial ecosystem is the banking system. They help the economy to grow by lending loans to corporations and individuals who use this money by investing in some enterprise or business. This cash flow is essential in any healthy economy. Consequently, the unpaid or nonperforming loans put stress on economy. To deal with this situation, Banks and credit card companies estimate credit score. This score provides an idea regarding the lender’s ability to make the repayment. To facilitate and improve the credit score prediction, we worked on a number of algorithms like linear regression, logistic regression and K-Nearest Neighbor algorithm (KNN). Using KNN algorithm along with some statistical work on dataset, we were able to obtain a very healthy accuracy of 89%. Keywords Credit score · Machine learning · K-nearest neighbor · Logistic regression · Credit risks
D. Swain (B) Computer Engineering Department, Pandit Deendayal Petroleum University, Gandhinagar, Gujarat, India e-mail: [email protected] R. Agrawal · A. Chandak · V. Lapshetwar · N. Chandak · A. Vaswani Department of Information and Technology, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] A. Chandak e-mail: [email protected] V. Lapshetwar e-mail: [email protected] N. Chandak e-mail: [email protected] A. Vaswani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_7
69
70
D. Swain et al.
1 Introduction The banking sector is one of the most integral cogs of the economy. This institution holds financial assets for others and invests those financial assets as a leveraged way to create more wealth. Methods to avoid corporate failures have been an important field of study. Extensive studies in the field of finance and management have been done since it is a vital for the risk avoidance of various financial institutions [1]. These studies have had a considerable impact on loans. Predicting the possibility of default can be critical and can lead to significant savings in the long run [2]. This is quite evident since India’s Non-Performing Loans were reportedly valued at 133.520 billion USD in December 2018. Saving this amount of money can be crucial for any economy [3]. Credit score is an expression that estimates a consumer’s credit worthiness. This value is often used by banks as well as credit card companies to evaluate the risks posed by extending loans to a consumer. A person’s credit score is dependent on several factors such as debt history, health and credit history. Around 35% of a person’s credit score is constituted from the payment history [4]. Credit can be roughly classified as good credit and bad credit. Good credit is attributed to making regular payments on all of our accounts till the balance is on the positive side. Consequently, bad credit is being unable to make the required payments or holding on to the loan for a long time along with making minimal payments [5]. Credit prediction models generally use various different statistical, machine learning and artificial intelligence techniques. These techniques include the decision tree, logistic regression, discriminant analysis, k-nearest neighbor (KNN), and back propagation (BP) neural network [6]. Predictions based on ML algorithms are assumed good for classifying data which is never seen before into their different categories. The predictive models work by predicting the most suitable category to which a data point belongs to by “learning” from labeled observations [7]. In this work, K-Nearest Algorithm is implemented on Australian Credit score dataset.
2 Literature Review Hsieh et al. [8] have performed a comparison between the three ensembles methods Boosting, stacking bagging which is based on four fundamental learners which are decision tree, artificial neural network, logistic regression, and SVM. The results of experiment show that these three ensembles methods can enhance individual learner’s performance (accuracy wise). Bagging is better in comparison between the other two methods. Bagging with DT also had a good performance. Nanni et al. [9] have developed credit scoring and bankruptcy prediction system using ensemble method random subspace (RS) along with Net classifier, performed better than other ensemble methods. As far as we know their work was the earliest that compared several ensemble methods of classifiers for bankruptcy prediction
Credit Score Prediction Using Machine Learning
71
and credit scoring. They performed their work on three different financial data sets, Australian credit, Japanese credit, and German credit dataset. They used different suitable toolkits. Turkson et al. [10] have built a model based on Bagging EM with rep tree for credit risk prediction. The model is divided in three phases, the first phase includes of loading and preprocessing of the data. In the second phase, they have built the model by training the dataset. In this phase, they have chosen the finest model for classifying the data by applying various models. In they applied bagging EM with REP tree model. In the third phase, they deployed the model to use it as a tool on new unclassified data. They have achieved maximum prediction accuracy of 81%. The dataset used was Taiwan Bank Credit Card dataset. The experiments were performed on the week DM tool. The main features of tools are data preprocessing tools and clustering, classification, feature selection capabilities, regression and association rules algorithms. Devi et al. [11] have applied some ensemble method of classifiers BaggJ48, AdaboostDS, and Random forest on German dataset for improving performance. Initially, dataset accuracy was measured before applying feature selection. Data were partitioned into two parts 60:40 and 70:30 and then applied, with feature selection. Then a distinct number of iterations were applied on the ensemble classifiers. Random forest classifier accuracy increased from 72.66 to 75% when it was applied on crossclassification of 70:30, which is higher in value than other partition of 60–40. In the same way, Beggrep and Beggj48 classifiers showed improvement from 70.33 to 71.66%, 70.33 to 72%, respectively. Jay et al. [12] used the feature selection process to remove irrelevant features and applied neural network in credit scoring. At first, they applied a feature selection algorithm on a dataset and then selected features from each of the algorithm which is used to build the model for risk classification using 1-D convolutional neural network. Australian, German, and Japanese credit datasets were used from their result, it can be concluded that number of features selected by WrapperSubsetEval, Random search, and BFS methods were almost similar. It was found that applying Random search on the German data set gave more accuracy of 84.78% than other selection methods.
3 Dataset The dataset collection is one of the most important parts of research. The Dataset is collected from UCI repository of datasets [13]. Australian Credit score dataset is used to execute machine learning model. This dataset is perfect for Classification. There are some missing values. Number of Instances in dataset is 690, which have 14 attributes plus class attributes. Table 1 shows the details about the dataset.
72 Table 1 Dataset details
D. Swain et al. S. No.
Feature
Value type
1
A1
Categorical
2
A2
Continuous
3
A3
Continuous
4
A4
Categorical
5
A5
Categorical
6
A6
Categorical
7
A7
Continuous
8
A8
Categorical
9
A9
Categorical
10
A10
Continuous
11
A11
Categorical
12
A12
Categorical
13
A13
Continuous
14
A14
Continuous
15
class attribute
1,2
+ : 307 (44.5%) - CLASS 2. -: 383 (55.5%) - CLASS 1.
3.1 Dataset Pre-Processing a. Missing Attribute In Dataset 5% (37) cases had missing values. These null values are replaced by mode in case categorical feature, mean in case continuous feature. In Fig. 1, heatmap of dataset is plot, we can see there no null value remaining in dataset. b. Standardization Fig. 1 Missing Value Heat Map
Credit Score Prediction Using Machine Learning
73
Standardization is the transformation that centers the info by removing the mean of every feature hence scaling it by dividing (nonconstant) the features by their variance. After standardizing the data, the mean is going to be zero and the variance will be one. Standardization can drastically improve the performance of models. If a feature incorporates a variance, which is of orders of magnitude greater than others, it would dominate the target function and unable the estimator to find out from other features correctly. Standardization (Z) formula equations—1, 2 and 3 Z=
X −μ σ
N 1 (x i ) μ= N i=1 N 1 σ = (x i − μ)2 N i=1
(1)
(2)
(3)
4 K-Nearest Neighbor (KNN) Algorithm KNN is used to solve problem related to classification and regression group[14]. In industry and research, KNN Algorithm is more often use in classification-based problems. The reason for the using of KNN can be attributed to its easy interpretation and low time complexity. KNN falls in the supervised learning algorithms. With dataset having training dataset measurements (x, y) and we need to find the mathematical relation between x and y. Our main objective was, getting model h:X → Y that having an unknown observation X(test dataset), which provide a function h(x) that can predict the output y with better efficiency. The K-nearest neighbor algorithm first finds the value of K, and then algorithm will find the K nearest neighbor of unseen data point. After that it will allocate the category to unseen data point by having the category, which has the first-rate number of data points out of all classes of K neighbor. For measurement distance, we will operate on the Euclidean metric [15]. Euclidean metric is the distance between two points. Euclidean metric is calculated by equation of Euclidean metric (4). Variables xn ,bn are X and Y coordinate respective points. d(x, b) =
x1 − b1
2
2 + · · · + xn − bn
(4)
74
D. Swain et al.
Fig. 2 K-nearest Neighbor
The input x will get allocate to particular class with the highest probability Eq. (5). P(y = j|X = x) =
1 I (y (i) = j) K i∈A
(5)
To select the K which predicts most accurately, we run the KNN algorithm multiple times with different values of K and selecting the value that increases accuracy of model along with stability of model and reduce the error rate. Lower k value decreases the accuracy of the model resulting with noise in the model. Inversely, with increase in k value and minimum error rate, the predication percentage of model improves. Very large k value can result in outliers as shown in Fig. 2 [16].
5 Methodology 5.1 Dataset Splitting The dataset that we load is split into two parts, i.e., training dataset and test dataset. Model learns from training dataset, which contains a known output, in order to discover other data later on. We have the test dataset to assessment our model’s prediction. We use 80% (552) data for training model (X) and 20% (138) for testing model (y).
Credit Score Prediction Using Machine Learning
75
Fig. 3 Error versus k values
5.2 Finding Best K Value During the building of the model, we need to select number of neighborhood (K) which is hyperparameter of model. K determining the accuracy level of our model [17]. For every data set K have different values according to train and test dataset. In case of very small k value, increasing effect of noise leads to over fitting, low variance, high bias and downgrade accuracy level. Very large k value will lead under fitting that is high bias, low variance and computational expensive. Odd valued k confusion between two classes. In Fig. 3, we have plot error versus k value graph, which help selecting k value.
5.3 Training and Testing Dataset After splitting the dataset and finding the correct k value, we will first train our model. We will pass correct value of k and fit training dataset. After this, we are ready to predict value of test dataset. The flowchart in Fig. 4 visually helps to explain the process and method used in training the algorithm from start till final training is complete.
5.4 ROC–AUC Curve Receiver Operating Characteristics (ROC) curve and Area under the curve (AUC) are important parameters to analysis model performance. ROC–AUC curve measurement for classification problems [18]. The graph shows the accuracy of the model in differentiating between classes. Higher the AUC percentage more accurate is the model. The Roc curve is plotted with false positive rate on X-axis and True positive rate on Y-axis Fig. 6. Objective of graph is finding a point where Area
76
D. Swain et al.
Fig. 4 KNN methodology flowchart
under curve is maximum. Basically more area under means model is distinguishing between class 0 and 1 [19]. AUC value is 92% after standardization (Fig. 5). True positive rate TPR = TP/(TP + FN)
(6)
False positive rate : FPR = FP/(FP + TN)
(7)
Fig. 5 ROC-AUC graph before Standardization
Credit Score Prediction Using Machine Learning
77
Fig. 6 ROC-AUC graph after Standardization
Table 2 Result analysis Testing dataset (%)
Accuracy (%)
Precision (%)
Recall (%)
F1 score (%)
50
86
90.35
86
87.95
k 27
40
86.23
92.16
88.95
88.26
41
35
86.36
92.41
85.89
89.58
19
20
89.13
96.55
87.50
91.82
47
6 Analysis Now our model is ready we have analysis model to predict best accuracy. We will test model on different splitting of dataset. As we can observe that splitting dataset into 20% training set, 80% testing set having best accuracy (Table 2).
7 Result 7.1 Confusion Matrix Confusion matrix measures the performance of machine learning classification model. Table is divided into four blocks, which have different combinations of actual values and predicted values. With confusion matrix, we can predict accuracy of model. True Positives—TP, False Positives—FP, True Negative—TN, False
78
D. Swain et al.
Fig. 7 Accuracy versus k value
Fig. 8 Heat map of confusion matrix
Negative—FN. From confusion matrix, we can drive accuracy of model [20]. We can see heatmap plot for confusion matrix in Fig. 8 (Fig. 7). Confusion matrix layout [TP FP] [FN TN]
7.2 Precision Precision is ratio of True Positives values (TP) is to total predicted positive values. High precision value lower will be false-positive value. Precision = TP/TP + FP.
(8)
Credit Score Prediction Using Machine Learning
79
7.3 Recall Recall is the ratio of True-positive values (TP)to all the values in actual class. If recall value increases false-negative rate decreases. Recall = TP/TP + FN.
(9)
7.4 F1 Score F1 Score is the average value of Recall and Precision. Accuracy increases if false positives and false negatives have similar observation. F1 Score = 2 ∗ (Recall ∗ Precision)/(Recall + Precision)
(10)
7.5 Accuracy Accuracy is ratio total correct values to total values. Accuracy is an important parameter to check model performance. Accuracy = TP + TN/TP + FP + FN + TN
(11)
Result: When dataset splitting is done, 20% test set and 80% train set we are getting accuracy 89.13%. In Fig. 6, we can see as k value increases testing accuracy also increases and at 47 neighbors, we get accuracy 89.13%. Confusion Matrix: [[84 3] [12 39]] precision
recall
f1-score
support
0 1
0.88 0.93
0.97 0.76
0.92 0.84
87 51
accuracy macro avg weighted avg
0.90 0.89
0.87 0.89
0.89 0.88 0.89
138 138 138
80
D. Swain et al.
8 Future Work The biggest drawback of KNN is the selection of the value of K, to overcome this drawback we will move to a predictive, nonparametric algorithm, which is not dependent on the value of K. We can use the data for loans in the banking industry to trade this model in which even outliers and noise is categorized well. This will be a very good training dataset. One more scope of the project is that by using neural networks, we can improve the accuracy more and to benchmark their performance against the models currently under investigation today.
9 Conclusion We have created a Credit score prediction model using machine learning. The prediction procedure begins with the cleaning up of data set. The data after cleaning are put into the algorithm. We have retrieved an accuracy of 89.13%, which is easier to predict loan sanction. This will help the banks for predicting the future of loan. This model can be used by bank and credit card companies to decrease the amount of default loans and losses.
References 1. H.A. Abdou, J. Pointon, Credit scoring, statistical techniques and evaluation criteria: a review of the literature. Intell. Syst. Account. Finance Manage. 18(2–3) 2. A.F. Atiya, Bankruptcy prediction for credit risk using neural networks: a survey and new results. IEEE Trans. Neural Networks 12(4), 929–935 (2001) 3. India Non-Performing Loans, in CEIC report 2018. 4. https://www.investopedia.com/terms/c/credit-worthiness.asp 5. https://mygreatlakes.org/educate/knowledge-center/credit.html 6. King et al., eds., in ICONIP 2006, Part III, LNCS 4234 (2006), pp. 420–429 7. A. Motwani, G. Bajaj, S. Mohane, Predictive modelling for credit risk detection using ensemble method. Int. J. Comput. Sci. Engineering. 6. 863–867. https://doi.org/10.26438/ijcse/v6i6. 863867 8. N.-C. Hsieh, L.-P. Hung, A Data Driven Ensemble Classifier for Credit Scoring Analysis. Expert Syst. Appl. 37, 534–545 (2010). https://doi.org/10.1016/j.eswa.2009.05.059 9. L. Nanni, A. Lumini, An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst. Applications 36, 3028–3033 (2009) 10. R.E. Turkson, E.Y. Baagyere, G.E. Wenya, A machine learning approach for predicting bank credit worthiness, in 2016 Third International Conference on Artificial Intelligence and Pattern Recognition (AIPR), Lodz (2016), pp. 1–7 11. C.R.D. Devi, R.M. Chezian, A relative evaluation of the performance of ensemble learning in credit scoring, in 2016 IEEE International Conference on Advances in Computer Applications (ICACA), Coimbatore (2016), pp. 161–165 12. J. Simha, Evaluation of feature selection methods for predictive modeling using neural networks in credits scoring (2020). 13. https://archive.ics.uci.edu/ml/datasets/Statlog+(Australian+Credit+Approval)
Credit Score Prediction Using Machine Learning
81
14. A. Moldagulova, R.B. Sulaiman, Using KNN algorithm for classification of textual documents, in 2017 8th International Conference on Information Technology (ICIT), Amman (2017), pp. 665–671 15. I. Dokmanic, R. Parhizkar, J. Ranieri, M. Vetterli, Euclidean Distance Matrices: essential theory, algorithms, and applications. IEEE Signal Process. Mag. 32(6), 12–30 (2015) 16. https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1531424125/ KNN_final1_ibdm8a.png 17. X. Yu, X. yu, The research on an adaptive k-nearest neighbors classifier, in 2006 5th IEEE International Conference on Cognitive Informatics, Beijing (2006), pp. 535–540 18. C.L. Castro, A.P. Braga, Optimization of the area under the ROC curve, in 2008 10th Brazilian Symposium on Neural Networks, Salvador (2008), pp. 141–146 19. D. Swain, S. Pani, D. Swain, Diagnosis of coronary artery disease using 1-D convolutional neural network. Int. J. Rec. Technol. Eng. 8(2), 2959–2966 (2019) 20. D. Swain, S. Pani, D. Swain, An efficient system for the prediction of coronary artery disease using dense neural network with hyper parameter tuning. Int. J. Innov. Technol. Exploring Eng. 8(6S) (2019), pp. 689–695
Stock Market Prediction Using Long Short-Term Memory Model Debabrata Swain, Vijeta, Soham Manjare, Sachin Kulawade, and Tanuj Sharma
Abstract Predicting stock market means having very precise information of physical, psychological, rational and irrational behavior of a company stock. Scholars gradually evolved various methods to predict the stock market price such as Fundamental Analysis and Technical Analysis. Here an intelligent forecasting system using LSTM is purposed, it involves all the important factors affecting the stock market and produces the most accurate prediction. In this paper, we are focusing on LSTM Model to predict the stock market of Infosys Company with stock indices with the data set from January 1996 to August 2019. The goal is to forecast the closing share of Infosys Limited based on the training data provided to the model. The accuracy is calculated on the basis of RMSE and the model has achieved an RMSE of 27.5 which is better than other models. Therefore, LSTM Model works efficiently with the real-time requirements. Keywords Long short-term memory (LSTM) · Prediction · Stock prices · Neural network (NN) · Artificial neural network (ANN) · Recurrent neural network (RNN)
1 Introduction Forecasting the Stock Market Indices has been the target of every stockholder since its existence. Accurate forecasting of any share company is always a herculean task because it can be affected by anything happening in the world. It may be a political move, geopolitics, terrorist acts or change of weather, etc. [1]. Every second billion of dollars are traffic on the exchange and behind each dollar is a stockholder hoping to gain in one way or another. In this paper, we are dealing with LSTM on share D. Swain Computer Engineering Department, Pandit Deendayal Petroleum University, Gandhinagar, Gujarat, India e-mail: [email protected] Vijeta (B) · S. Manjare · S. Kulawade · T. Sharma MCA Department, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_8
83
84
D. Swain et al.
market prediction. The share market is simply a collection of the market where Brokers and Traders can buy and sell the share of stocks, bonds, or other securities. Cowles [2] expressed that no skills existed to predict the stock market. The successful prediction of stocks future may result in significant profit for a trader. Here, the accurate prediction becomes highly important given the fact that every person in the market expecting a higher profit. In order to understand the facts that our way of prediction is better compared with the technique [3] used earlier in the past. Forecasting approaches fall into two wide categories, which can overlap. They are fundamental analysis and technical analysis [4]. Fundamental analysis is more of a long-term strategy. The principle goes along well with the theory that business is all about profit nothing else. This method hardly applicable in the modern competitive atmosphere as its fundamental does not suit the market behavior. Technical analysis determines the future prices and data of a stock market are merely based on the trends of past prices and data. Today an infinite number of factors affect the stock market. Only one aspect cannot guarantee to forecast. In Deep Learning, all the aspects are considered as a tool for prediction. This study aims to find the appropriateness of LSTM Model that is used in building the predictions about the closing price of a stock at the end of the day by using the technical analysis method. The closing price is an important feature of Stock Market Prediction.
2 Literature Review Rigorous literature research was conducted to get a well comprehension of the objective of the topic with a comparison of countless research papers on stock market forecasting. They used different Machine Learning Algorithms, an ANN and a RNN [5–8], etc. Charles K. Ayo Et Al. applied ARIMA Model for forecasting Share Price [5]. In this paper, they used an extensive process of building stock prices using the ARIMA Model. They used the Eviews software tool for implementation. They determined that in all the models compared greater results were achieved through the ARIMA model from several experiments performed depending upon factors as relatively small of BIC, Relatively small standard error of the regression and relatively high of R*R, etc. Results acquired revealed that ARIMA models engage well in short-term prediction. Adebiyi et al. applied ANN to forecast the closing price of stock indices [6]. In this paper, they used technical variables, primary analysis variables, and some composite of variables as inputs to test the model. They used a hybridized approach in which they experimented with various models of NN with the configurations like 18-18-1,18-19-1,18-20-1, where 18 is the number of input variables, middle varying values are a number of hidden layers and 1 is expected number of outputs. Also, they carried an experiment using only technical analysis variables. The hybridized approach, which integrates variables that are technical and fundamental, with a configuration of the 18-24-1 backpropagation network gave the best
Stock Market Prediction Using Long Short-Term …
85
result and when technical variable analysis was used with the configuration of 10-17-1 best result were acquired. Mankar et al. [7] applied Machine Learning Algorithm to predict stock price using social sentiments. They collected tweets corpus from a python library called Tweepy. Then they performed training on two models one with Naïve Bayes and the other one being the Support vector machine and the study clearly showed us that the support vector machine was proven to be the best in comparison. Kumar and Murugan applied time-series data with ANN [8]. They designed the feed-forward backpropagation for a NN which can be used for prediction of a stock index for the next day and convergence of BPN is based on components such as learning algorithm, initial weights required for the model, learning rate of the model, nature of training set and the size, etc. MAE, MAPE, PMAD, MSE, and RMSE are the methods used to measure the forecasting accuracy of the model for every experiment.
3 Neural Network Architecture The term neural network means the arrangement of neurons and connection patterns between layer, activation functions, and learning method [9]. RNN is an abstraction of a feedforward neural network, which has internal memory. LSTM is a modified version of RNN. NN consists of loops in them, allowing information of training data to be passed from one step of network to the next step. The architecture of the neural network defines how the network transforms the input into a corresponding output. A simple Neural Network Formation is shown in Fig. 1.
Fig. 1 Simple neural network formation
86
D. Swain et al.
3.1 LSTM Model LSTM [10] is an exceptional kind of RNN, which is capable of learning all the reliance, which are long term in nature. RNN’s are also capable of handling such long-term dependencies but while handling them, RNN suffers through problems like vanishing gradient and exploding gradients. But LSTM overcomes those problems because unlike RNN, in LSTM, repeating modules has a different structure. Instead of one, there are three neural networks interacting in a special way. These are: 1. Forget Gate Layer: In LSTM [11], the initial step is to make a decision that what data should be discarded from the cell state and the decision for this is made by the sigmoid layer function called Forget Gate Layer. Its output value is between 0 and 1, where 1 means keeping those values, and 0 means throwing those values. 2. Input Gate Layer: The sigmoid layer has the ability to decide which new information or data we can store into cell state is known to be Input Gate Layer. 3. Tanh Layer: Tanh Layer builds a vector of new candidate values. It pushes values in between −1 and 1 (Fig. 2). The equations for [12] gates in LSTM are given in (1)–(5). f = σ (Mf ∗ a + Nf ∗ h − 1 + vf )
(1)
i = σ (Mi ∗ a + Ni ∗ h − 1 + vi )
(2)
o = σ (Mo ∗ a + No ∗ h − 1 + vo )
(3)
c = f ∗ c − 1 + i ∗ tanh(Mc ∗ a + Nc ∗ h − 1 + vc )
(4)
h = o ∗ tanh(c)
(5)
Fig. 2 LSTM unit with four repeating modules
Stock Market Prediction Using Long Short-Term …
87
Here M f , M i , M o , M c , N f , N i , N o and N c are weight matrices of weight going to layers from network inputs; vf , vi , vo and vc are bias vectors; and i, f, o, c, h are Input gate, Forget gate, Output gate, Cell State and Hidden State, respectively.
4 Methodology 4.1 Dataset For this study, we use data containing prices for Infosys Limited an Indian multinational company, which is obtained from Yahoo financial website [13] and the data contain prices of stocks with parameters high, low, open, close and volume from the year 1996 to 2019 (Table 1).
4.2 Parameter Setup Different input data are used to predict stock prices. We have used data that have five parameters or features namely high, low, open, close, and volume. All five features are taken to predict the values from stock prices. These features are used to forecast thereafter closing price of stock market. The closing price of the Company is predicted based on the training data provided to the model. Open and Close features in the training data give us the values of opening and closing price of the stock on a specific date in the stock market of the company. High and low features in the training data give the highest and lowest values that the stock has gained or loosed on a specific date in the stock market of the company. Volume is the feature where the number of shares is being stated. This is an important feature as the volume of shares with other features of training data on a specific date can help the model learn more accurately (Table 2). Table 1 Description of Dataset
Table 2 Input Feature
Time interval
Training dataset
Test dataset
01-Jan-1996 to 30-Aug-2019
4500
1300
Inputs Volume High Low Open Close
88
D. Swain et al.
Fig. 3 Flow chart
4.3 Root Mean Square Error It is standard deviation of residuals. It measures how far the points are from the line of regression. In other words, RMSE determines fitting of data to the regression line. The Root Mean Square Error (RMSE) function is shown below in Eq. (6). RMSE =
t = 1(gt − f t )2 /n
(6)
Here, gt = actual value, f t = predicted value, n = number of data points.
4.4 Training Detail This model includes a single input layer that accepts a three-dimensional array form past data, which are input scaled, and then these data are then accompanied by a layer that is consistently attached to the Neural Network layer known as Dense layer [14, 15] (Fig. 3).
5 Result The final outcome of the study was obtained from the stock price year ranging from 1996 to 2019 is used in the testing data set and the splitting of the data is done on this testing data set. Two output layers are used to train the model; the first one is linear activation layer, testing of this layer scaled ranges of (0, 1) and (−1, 1). The second layer is the sigmoid activation layer. The performance testing of the sigmoid layer ranged of (0, 1). Mean absolute error under all outputs were smooth and trouble-free
Stock Market Prediction Using Long Short-Term …
89
Fig. 4 Actual price in the dataset compared with predicted price by the model Comparison of Infosys Limited
Fig. 5 Root mean square error (RMSE)
when numbers of epochs were increased. Tanh was also tested in a range of (1, 1). The linear layer range was scaled to a range of (−1, 1), and using this mean RMSE of 27.5 and loss of 0.0006 approximately was obtained (Figs. 4 and 5).
6 Conclusion Five input variables used as features to test the model, i.e., daily stock price opening, closing, highest, lowest, and volume values the LSTM model yielded some fairly, satisfactory, and accurate prediction for all the test run. The training data consist of 4500 data entries to train the LSTM Model. Using a four-layer LSTM architecture with over 50 epochs and loss starting from 0.0941. We have achieved a total loss of 0.0006017 of data while training the LSTM model. This loss is calculated using mean squared function and indicates the amount of necessary data that is being discarded over every epoch. Lower the loss number indicates the model is learning better over every epoch. By using Root mean absolute error loss function which is a difference in between true and predicted value was minimized. In the dataset, it was observed that any type of outliers could not affect the model. The linear activation layer that is used in
90
D. Swain et al.
the model with the input, which was scaled to the ranges of (−1, 1) yielded better accuracy in the model, which is then compared with the output layers of the model in this case scenario. An average of RMSE of 27.5 was obtained when the output layer, which was scaled in the ranges of (−1, 1). Using this model with the capability of expedient prediction and adapting some changes to the features of the model which needs frequent training of the data. This LSTM model can be further used to test the data with different features and varying values of other companies as only data of one company from 1996 to 2019 were tested.
References 1. G.S. Atsalakis, K.P. Valavanis, Surveying stock market forecasting techniques—part II: soft computing methods. Expert Syst. Appl. 36(3) PART 2, 5932–5941 (2009) 2. A. Cowles, Can stock market forecasters forecast? Econometrica 1, 309 (1933) 3. E.F. Fama, The behavior of stock-market prices. J. Business 38(1), 34–105 (1965). https:// www.jstor.org/stable/ 4. Y.-H. Lui, D. Mole, The use of fundamental and technical analyses by foreign exchange dealers: Hong Kong evidence. J. Int. Money Finance 17(3), 535–545 (1998), ISSN 0261-5606. https:// www.sciencedirect.com/science/article/pii/ 5. A.A. Ariyo, A.O. Adewumi, C.K. Ayo, Stock price prediction using the ARIMA model, in Proceedings of the 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, Cambridge, UK, 26–28 Mar 2014 6. A.A, Adebiyi, A.K, Charles, A.O, Marion, O.O, Sunday, Stock price prediction using neural network with hybridized market indicators. J. Emerg. Trends Comput. Inf. Sci. 3(1), 1–9 (2012) 7. C.S. Lifna, T. Mankar, T. Hotchandani, M. Madhwani, A. Chidrawar, Stock market prediction based on social sentiments using machine learning, in 2018 International Conference on Smart City and Emerging Technology (ICSCET). https://doi.org/10.1109/ICSCET.2018 8. D.A. Kumar, S. Murugan, Performance analysis of Indian stock market index using neural network time series model, in 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering (PRIME), pp. 72–78. IEEE (2013) 9. B.M. Wilamowski, Neural network architectures and learning algorithms. IEEE Ind. Electron. Mag. 3(4), 56–63 (2009) 10. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 11. A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM networks, in Proceedings of International Joint Conference on Neural Networks, vol. 4 (2005), pp. 2047–2052 12. F.A. Gers, N.N. Schraudolph, J. Schmidhuber, Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2003) 13. Yahoo Finance for Historical Data. https://finance.yahoo.com/ 14. D. Swain. S. Pani, D. Swain, Diagnosis of coronary artery disease using 1 D convolutional neural network. Int. J. Recent Technol. Eng. 8(2) (2019). ISSN: 2277-3878 15. D. Swain. S. Pani, D. Swain, An efficient system for the prediction of coronary artery disease using dense neural network with hyper parameter tuning. Int. J. Innov. Technol. Explor. Eng. 8(6S) (2019). ISSN: 2278-3075
Efficient Management of Web Personalization Through Entropy and Similarity Analysis Sujata H. Asabe, Ashish Suryawanshi, Vinit Joshi, Deepesh Abhichandan, and Gourav Jain
Abstract The Internet platform has been gaining increased popularity since the internet platform is a storehouse of knowledge that is extremely immense and varied. The size of the internet increases every day and more and more content is being added to the platform, which makes it difficult to access the information that is required. Therefore, various search engines are developed that enable searching for relevant webpages according to the query provided. The search results are mostly generalized and the results are fetched every single time the query is fired, which increases the load on the search engine which is not personalized according to the user. There is an immense need to personalize the web platform as it is beneficial to the provider as well as the user in achieving a streamlined approach that can be set according to the user’s preferences and needs. Therefore, to improve this situation and provide a definitive solution to the web personalization problem, this publication utilizes the Entropy Estimation and Cosine Similarity to automatically personalize the content for the query passed enhancing the personal user experience significantly. Keywords Web personalization · Cosine similarity · Entropy estimation · Shannon information gain
1 Introduction Affinity for data and knowledge has been ingrained in human beings since the dawn of time. Ever since humans evolved from primates they have been collecting and managing data, which is considered as the crux of civilization. Human beings have always been in the pursuit of innovation and knowledge as they are evolved to become better and better on this planet. Humans started as Hunters and gatherers, which did not have a lot of knowledge about the various processes that are responsible for life on earth. The existence of humans has been primarily for survival and the procurement S. H. Asabe (B) · A. Suryawanshi · V. Joshi · D. Abhichandan · G. Jain Department of Computer Engineering, G H Raisoni College of Engineering and Management, An Autonomous Institute & Affiliated To Savitribai Phule Pune University, Pune 412207, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_9
91
92
S. H. Asabe et al.
of sustenance. During this time most of the knowledge that was gained about the various edible foods and the techniques for hunting different animals were taught to the younger generation through word of mouth, paintings, and actual training. This is evident through the discovery of stone age era cave paintings which depict various techniques used and the hunts that were performed by those people at that time. This was usually the means of proliferation knowledge and data that existed at that time as there were no other storage facilities available. The human race then transitioned from the hunter-gatherer lifestyle to an agrarian lifestyle, which led to the creation of complex language and writing methods that utilized tablets and other materials that could be used to store information. But this process was highly time consuming, therefore was used only for highly important or spiritual information. This is how we see a lot of stone tablets with information written on them that have been excavated from the ruins of Mohenjo-Daro and Harappa ruins, which are some of the earliest civilizations known to mankind. This was the technique that was used for the preservation of knowledge and information in those days. This was continued even further for a long time up until the invention of papyrus by the Egyptians which was very similar to modern-day paper and that is how the term “paper” was coined. Papyrus was highly useful as it allowed for a much portable means of transportation as the paper was light and also for storage as the papyrus was very thin and it could be stored in the form of scrolls in large quantities in storehouses. This allowed for storage of large quantities of information with relative ease, which was the precursor to books. Books were made as a collection of papers that were related to a singular topic and bound together. This was a much easier way of storing and sharing information with the upcoming generations. All the books that existed around that time had been handwritten. It was due to this fact that making copies of books was a very difficult task as it had to be rewritten manually. This was a highly tedious process that took a lot of time and created very fewer books. The efficiency of this process was very low and also due to being handwritten a lot of discrepancies used to creep in due to human error. Even then a considerable number of books were created and were distributed so that the knowledge could be dispersed. This was the case for a significant duration until the creation of the Guttenberg printing press. The printing press revolutionized the way of mass-producing copies of books. This printing press could accurately keep creating more and more copies of the same book, which could be distributed to a far larger audience. Thus, these books served as the means of distribution of knowledge and are still highly popular and used significantly all over the world. The thirst for knowledge for humans was not over yet as in late 1970, the US military wanted a solution for easy and fast communication between different parts of the USA. Therefore, a lot of researches started working on a concept that would take shape as the modern-day internet. It was originally designed to provide a means of communication and resource sharing for researchers without actually traveling from one location to another. Due to the large-scale success and the military understanding, the immense scope of this platform eventually released it for public use.
Efficient Management of Web Personalization Through Entropy …
93
The Internet has grown significantly over the years and has become a large repository of information that we know today. Nowadays, a lot of web portals and websites have been created for various specific purposes such as communication information gathering searching, etc. Due to large fragmentation in the internet platform, a user does not get a unified experience when connecting to the platform. This means that the experience of the user is very different from one site to another as it has different layouts font size themes, etc. This reduces the quality of experience for the user as well as becomes highly difficult for a person that is elderly or disabled to access the internet platform easily and coherently. Therefore, it is the need of the hour for the creation of an effective web personalization technique that would personalize the experience of the web for the user according to his or her taste efficiently. This would highly benefit people the internet has been designed by an ethic that confirms to the one-size-fits-all representation. This also improves upon the experience as most users that browse the internet, have a less personal feeling which can be significantly changed by the introduction of web personalization. Web personalization can be viewed as the next big revolution that can drastically change the experience of a user on the internet and for the better. Section 2 of this research article is written for the Literature survey that is being performed on the past research work of many renowned authors. The methodological details of the proposed model are narrated in Sect. 3. The obtained results of the proposed model are evaluated in Sect. 4 and finally Sect. 5 represents the conclusion and future scope.
2 Literature Survey It is a big challenge content that is required by the user from various fields from the high-ended and limitless internet and World Wide Web. As the information accessible on the web is not appropriate for all web surfer listings. To raise the usage of Web services to make the internet productive, it is very necessary to predict the user needs. Thus the proposed solution by using Web Personalization technique reconstructs services provided by a particular web site to a user. Semantic technology also depends on user behavior and ontology to get the high-end Periodic web personalization [1–3]. Web data have been increased to many folds in the past decade. Providing a wellorganized web services study of web data for collecting relevant information is crucial [3]. To a certain extent, this problem can be solved by using Web personalization. It is very necessary to know the user’s browsing behavior and predicts their interesting areas and it is classified in a very good manner in the proposed model [4]. They calculated user interest in a webpage by analyzing the timing attributes. To generate navigational patterns in the offline phase and SVM classifier, they used the K-mean clustering algorithm. An exponential manner data are growing on the World Wide Web. Because of the large collection of miscellaneous documents fetch the most appropriate information
94
S. H. Asabe et al.
from the web as per the user requirement has become difficult. For this difficulty, the proposed paper [5], came with a solution by implementing Frequent Sequential Patterns (FSPs) that are extracted from Web Usage Data (WUD) are very important for analyzing technique which will understand user requirement to improve the quality of the World Wide Web (WWW). This elaborates [2], since the starting point of early 90 s researchers came up with many technologies to support and create a recommender system. A limitless area for research and innovation is to be done on the recommender system. In the recommender systems, statistical sciences, image mining techniques and also text mining techniques all the technologies have been used. In the fields of education, medicine, academics, movies, and entertainment Recommender systems have widely used. Recommender systems are usually divided into three categories, there is a content-based recommendation, a collaborative based recommender system, hybridbased recommendation system. This paper [6] suggests improving the quality and accuracy of web search personalized web Search is a technique of searching. The main aim of personalized web search is to personalize search results that are more relevant and tailored to the user interests, collecting and aggregating user information that can be private for effective personalization. As the world is growing fast in information technology and the Internet made search engines serve as the main information to users in such situations Personalized Web Search (PWS) gains importance. Thus, it improves the quality of search services on the Internet [6]. This proposes there has been tremendous growth of information on the World Wide Web to find and access relevant information is a real challenge. As a popular solution to customize the World Wide Web environment Personalization is required. In the proposed paper generate frequent sequential patterns CloSpan, a state-of-theart algorithm for Sequential Pattern mining is applied [7]. The web Recommender system is a specialized personalization system. The quality can also be improved by the Web Usage Mining process. Thus, the results show a promising significant improvement in the quality of the recommendations. There is a large amount of unprocessed information due to advancements in technology due to this many times web users face the problem of information overload. Web Mining can be described as automatically extracting information from the web using data mining techniques. Three different parts of classified Web mining are Structure Mining (WSM), Web Content Mining (WCM) and Web Usage Mining (WUM). Extraction of useful knowledge from the content of the web pages like text, image, video, audio comes under Web content mining [8]. Web structure mining involves the analysis of out-links and in-links of web pages. Web usage mining analyzes activity logs or search logs. As millions of users are interchanged daily with the websites and visiting lots of websites, leaving back a different types of data. The prediction of the next page likely to be accessed by a user should be predicted by the extracted information on the present page. Thus, this can be done by web usage mining is to process web data for predicting and identifying accessed information [9]. The implementation of web usage mining involves three stages first is preprocessing, second is pattern discovery
Efficient Management of Web Personalization Through Entropy …
95
and the last one is pattern analysis. The proposed paper of the organization is to make decisions and to personalize the web pages. The process of semantic web mining is very much relevant in social media and networking sites which will most of the time result in overloading of the content. To perform information filtration, the personalized system needs to deal with a large information system. This semantic web mining deals with manage begin, sparsity, and loss of data issue. Large synergistic recommended frameworks can be used either to predict how much a client will like a thing. Thus the proposed paper [10, 14] focuses on the technique of Clod start and as well as Lack of information problem. Aims to solve the problem occurred while improving the performance of web personalization algorithms are the concurrent use of structural data and user browsing information. In the proposed paper, they have used the weighing criterion a new algorithm based on the graph structure between web pages to offer pages to users. Link graph structure and the collection of heavy item sets are merged to form a new algorithm and it generates new association rules based on weighted items or the so-called weighted association rules. Thus the backup value is between 5 and 20, 70 and 82% of the proposed algorithm [11]. Various researches have been taking place in the field of web data mining and personalization. Due to Exposure to the internet and extensive usage of the web, an immense amount of data, like videos, images, and web pages, have been generated. For collecting and handling relevant information that helps in boosting the business of an organization by technological or business innovations, the web mining is one of the best techniques [12]. The results are promising by using the Web personalization model using real and synthetic data. IT introduces a new technique Web tracking, it is mostly used to retrieve user information for activities such as personalization. Web tracking allows third-party or first-party websites to know the users’ browsing history and browsing configuration to these ends. Types of web tracking are Behavior A (Analytics), Behavior B (Vanilla), Behavior C (Forced), Behavior D (Referred), Behavior E (Personal). WebTracking Techniques are Stateful tracking, Stateless tracking [13]. This proposed the technique of Web tracking as well as techniques for its detection and analysis, and countermeasures to prevent web tracking. This paper elaborates as there are vast improvements in the field of education from last several years. The paper proposed the design of traditional assessment, using a technique that can both respond to the corporate requirements reflect learners’ competencies. Thus the proposed paper personalized competence assessment technique serious games, which use the semantic web to ensure interoperability and reuse of personalized rating resources. Thus the methodology depicted seeks to eliminate complexity in the generation of the assessment activity on serious games. Thus, the proposed paper focuses on a formal description of Assignment rules.
96
S. H. Asabe et al.
3 Proposed Methodology The proposed system of web personalization’s overview diagram is depicted in Fig. 1, and the steps that are taken to achieve this are explained below. Step 1: URL and Keyword Storage: This is the primitive stage of the proposed model, where admin of the system is login into the system through an interactive user interface. Afterward the admin of the system stores many URLs and their relative keywords, so that the relative URLs can be retrieved based on the fired user query and personalization. Step 2: User Query Preprocessing: This is the initial step carried by the end user, where user login into the system after his/her sign up. The user fires a query in the form of a string to get the results which are a mixture of Fresh search URLs and personalized URLs. This process of searching starts with the process of preprocessing of the fired query. This relatively decreases the size of the query without losing its original semantic using the following steps.
Fig. 1 The overview diagram for web personalization
Efficient Management of Web Personalization Through Entropy …
97
Special Symbol Removal—There are special symbols in the English language that provide grammatical structure to the sentences and allow for a smooth flow such as !,?,., etc. These are not necessary for a query and are subsequently removed in this step. Tokenization—The process of tokenization is the creation of smaller and manageable tokens or pieces of text that can be effectively utilized in further processing. The tokenization approach to segment the input string must be achieved through the implementation of the delimiters. The most common delimiter that can tokenize individual words is space. The tokenization also has a highly positive effect on the processing of the system as it can reduce the time taken for the execution. The Query needs to be processed and executed in an effective manner by the proposed system. This is very difficult to achieve in a normal string format. Therefore, the tokenization step converts the string into a well-indexed string. This serves a dual purpose as this well-indexed string can be used for further processing by conversion into an array very easily. Stopword Removal—The stopword removal process is one of the most important aspects of the preprocessing. The preprocessing ensures that the input string that is being provided to the system is clean and free of any redundancies. There are redundant words in the English language that do not provide any intrinsic meaning to the sentence, which serves only to provide a streamlined flow to the conversation. Such words are not needed in the query as it would not be necessary and increase the processing time of the system considerably. These words are subsequently eliminated in this step. The stopwords are words that are used in the English language to provide conjunction and flow to the conversation such as, in, an, the, etc. These stopwords do not provide any meaningful contribution to the sentence hence can be removed without any change in the semantics of the string. Therefore, to make the string lighter and reduce the unnecessary strain on the system, the stopwords are purged from the string effectively. The Stopword removal in the proposed system reduces the execution delay experienced. For example, the phrase “going to read” is processed using this step of the preprocessing. The stopword in this phrase is “to” which will be removed during the execution and convert it into “going read.” This example conveys that the stopwords removal process does not change the meaning of the sentence. Stemming—Stemming reduces the words utilized in the Query to their root forms. This is important as most of the words are unnecessarily long and stemming does not change the semantic meaning of the sentence. Stemming significantly reduces the resources required to process the query effectively. For example, “running” will be stemmed into “run” through the removal of the substring “ing” which is replaced with an empty character. It can be noticed that the semantic difference between “running” and “run” is not significantly large but it can have a significant impact on the time taken for the processing of the query.
98
S. H. Asabe et al.
Step 3: Capturing of Interest and Entropy Estimation—As the fired query by the user is preprocessed, and then it is tokenized into an array to count the keywords for each of the stored URL in the database. Based on this count, the URLs are sorted in the descending order to display the same to the user as the result. As the user clicks on his/her desired URL, then that URL is subject to estimate its distribution factor or Entropy by calculating the Information gain value. Here in this process, the clicked URL’s presence is evaluated by the entire existed user’s profile and counted for the same. Then Entropy is estimated using the following Eq. (1) of Shannon information gain. IG = −
P N N P log − log T T T T
(1)
where P T N IG
Count of number of users for the clicked URL. Total Number of users. T − P. Information Gain for the URL.
The obtained value of the Information gain is in the range of 0–1. Any value of information gain for the clicked URL is nearer to 1 represents the importance of the URL. So in the proposed model, a threshold value of 0.4 is set to decide the important URL for the given query, then to store in the database named entropy information along with the keyword. This process of Entropy information factor estimation is depicted in the below-mentioned Algorithm 1. //Input: User Profile List UPL, Unique User List UUL, URL //Output: Gain List GL 1: Start 2: for i=0 to Size of UUL 3: USER=UUL[i] 4: TLST = [TLST = Temp List] 5: count=0 6: for j=0 to Size of UPL 7: TL = UPL [j] [TL = Temp List] 8: LUSER=TL[0] 9: LURL=TL[1] 10: if(LUSER==USER AND URL=LURL),then 11: count++ 12: end if 13: end for 14: P=count, T= Size of UUL, N = T-P 15: E= (-P/T) log(P/T) (-N/T) log(N/T) 16: TLST [0]= USER, TLST [1]=E 17: GL = GL + TLST 18: end for 19: return GL 20: Stop ALGORITHM 1: URL Entropy Estimation
Efficient Management of Web Personalization Through Entropy …
99
Step 4: URL Personalization through Cosine Similarity—Once the important URL is stored in the database based on its entropy, then after this for all the fired query by the user is checked for its keywords in the entropy database. If the fired query keywords are found in the entropy database, then the respective URL is fetched and then it is correlated with all the existed users. In this correlation process, two arrays are created, namely P and Q both are of size of number of users. Then the P array contains all the value as 1 because of the assumption that the entire user is also having their interest in the URL. On the other hand, Q contains binary value 0 or 1 on the basis of the URL’s presence in the user’s profile. Then these two arrays are fed to the cosine similarity estimation as mentioned in Eq. (2). This Cosine similarity yield value in between 0 and 1. cos( P, Q) = (P · Q)/PQ
(2)
where P is the array contains assumed values. Q is the array contains labeled values (0 or 1). cos(P, Q) is the Cosine Similarity. The value nearer to 1 indicates the more similarity in the Searched URL. If the similarity is more than 0.5 then it is selected as the desired URL for the personalization and ranks it to the top of the users display list and then it is personalized to the user. This selected URL is stored in the database along with the last traced timings for the fired query by the user. To maintain the space complexity of the model, the proposed system drops any personalized URLs that are behind a threshold time of the usage.
4 Results and Discussions The presented technique for achieving an effective web personalization mechanism has been developed on the NetBeans IDE by utilizing the Java programming language. For the implementation of the proposed methodology, a development machine is utilized, which is furnished with an Intel i5 processor along with 500 GB of HDD for storage and 4 GB of primary memory as RAM. The database prerequisites are satisfied by the MySQL database server. Substantial Experimentation was executed to ascertain the performance metrics of the proposed system. For the measurement, the accuracy of the proposed technique, the Precision and Recall analytical technique was used, which can accurately determine the performance metrics of the proposed system. The performance metrics were measured to determine that the web personalization mechanism based on the cosine similarity and Entropy estimation described in this paper has been implemented properly.
100
S. H. Asabe et al.
4.1 Performance Evaluation Based on Precision and Recall Precision and Recall can provide valuable information regarding the performance of the presented system. The precision and recall parameters are highly accurate and functional parameters that apply to the evaluation of the unmitigated performance of the methodology. Precision in this evaluation measures the relative accuracy of the presented technique by extricating the accurate values of the level of precision acquired in the methodology. Precision in this evaluation is being measured as the ratio of the combined sum of all the relevant URLs personalized to the number of irrelevant URLs personalized. Therefore, the measurement of the values of precision results in a thorough evaluation of the accuracy of the proposed methodology. The Recall parameters are used for measurement of the absolute accuracy of the system, which is inimitable to the precision parameters. The Recall parameters are measured by evaluating the ratio of the number of relevant URLs personalized versus the total number of relevant URLs not personalized. This analytical approach furnishes useful information as it measures the absolute accuracy of the methodology. Precision and recall are mathematically explained in the equations given below. Precision can be concisely explained as below. • A = The number of relevant URLs personalized using Cosine Similarity • B = The number of irrelevant URLs personalized using Cosine Similarity • C = The number of relevant URLs not personalized using Cosine Similarity So, precision can be defined as. Precision = (A/(A + B)) * 100 Recall = (A/(A + C)) * 100. The equations outlined above are used for performing extensive experimentation on the proposed methodology through the evaluation of the results of the Cosine Similarity module. The experimental results are listed in Table 1 (Fig. 2). Table 1 Precision and recall measurement table for the performance of web personalization using Cosine Similarity No. of queries
Relevant URLs personalized (A)
Relevant URLs personalized (B)
Relevant URLs not personalized (C)
Precision
Recall
89
79
4
6
95.18072289
92.94117647
59
54
3
2
94.73684211
96.42857143
116
107
4
5
96.3963964
95.5371429
32
29
1
2
96.66666667
93.5483871
131
122
4
5
96.82539683
96.06299213
Efficient Management of Web Personalization Through Entropy …
101
Fig. 2 Comparison of precision and recall for the performance of web personalization using Cosine Similarity
The graph plotted above demonstrates that the Cosine similarity in the proposed system acquires exceptional measurements of precision and recall for the implementation of a web personalization mechanism. The proposed model acquired the precision of 95.96% and Recall of 94.90% and this precision and recall parameters indicate that the cosine similarity module performs with very high accuracy and efficiency in the presented technique and achieves effective and accurate Web Personalization.
5 Conclusion and Future Scope Web personalization is one of the upcoming revolutions in the internet industry. This is because the experience on the internet platform can vary significantly from one website to another. The internet platform is highly fragmented and cannot provide the user with a streamlined experience that can be constant and according to the various preferences and guidelines from the user. This fragmented approach leads to a loss in interest of the user and defeats various engagement techniques utilized by the web developers. It is also highly taxing for the various search platforms as the search has to be performed every single time a query is fired that increases the load on the server. The search experience also leaves a lot to be desired due to personalization not being implemented for the particular user. Therefore, the proposed methodology ameliorates these effects by introducing web personalization for the search engines that produce accurate, efficient and personalized results for the user through the use of Entropy estimation and Cosine Similarity on the fired query. The extensive experimentation reveals that the proposed methodology achieves significant improvements over the traditional techniques.
102
S. H. Asabe et al.
For future research, the presented technique can be deployed in a real-time model to work on social media websites and different URLs. The methodology can also be extended into different domains in the future.
References 1. Y. Raju, D. Suresh Babu, K. Anuradha, Analysis on periodic web personalization for the efficiency of web services, in IEEE Xplore Compliant—Part Number: CFP18BAC-ART (ICICCT, 2018). ISBN: 978-1-5386-1974-2 2. T. Bhattacharya, A. Jaiswal, V. Nagpal, Web Usage Mining and Text Mining in the Environment of Web Personalization for Ontology Development of Recommender Systems (IEEE 2016). ISBN: 978-1-5090-1489-7/16 3. M.R. Shinde, A. Sujata, Data mining: scope future trends and applications. Int. J. Eng. Technol. Sci. Res. (IJETSR) 5(4) (2018) 4. P. Das, G.P. Sajeev, Time-Driven Adaptive Web Personalization System for Dynamic Users (IEEE, 2017). ISBN: 978-1-5090-6621-6/17 5. B.J. Doddegowda, G.T. Raju, S.K.S. Manvi, Extraction of behavioral patterns from preprocessed web usage data for web personalization, in IEEE International Conference on Recent Trends In Electronics Information Communication Technology, 20–21 May 2016 6. A.T. Ramitha, J.S. Jayasudha, Personalization and privacy in profile-based web search, in International Conference on Research Advances in Integrated Navigation Systems (RAINS 2016), 06–07 April 2016 7. C. Ramesh, K.V. Chalapati Rao, A. Govardhan, Ontology-Based Web Usage Mining Model (IEEE, 2017). ISBN: 978-1-5090-5297-4/17 8. M. Bharti Pooja, T.J. Raval, Improving web page access prediction using web usage mining and web content mining, in IEEE Conference Record # 45616; IEEE Xplore. ISBN: 978-17281-0167-5 9. M. Dhandi, R.K. Chakrawarti, A comprehensive study of web usage mining, in Symposium on Colossal Data Analysis and Networking (CDAN) (2016) 10. R. Bhargava, A. Kumar, S. Gupta, Collaborative Methodologies for Pattern Evaluation for Web Personalization Using Semantic Web Mining (2016). IEEE Xplore Part Number: CFP19P17ART ; ISBN:978-1-7281-2119-2 11. Z. Bemani, H. Rashidi, A hybrid graph-structure-based webpage recommendation algorithm based on weighted association rules, in 4th International Conference on Web Research (ICW) (2018) 12. G.P. Sajeev, P.T. Ramya, Effective web personalization system based on time and semantic relatedness, in International Conference on Advances in Computing, Communications, and Informatics (ICACCI), Jaipur, India, 1–24 Sept 2016 13. I. Sanchez-Rola, X. Ugarte-Pedrero, I. Santos, P.G. Bringas, The web is watching you: a comprehensive review of web-tracking techniques and countermeasures. Logic J. IGPL Adv. (2016) 14. L. Cheniti-Belcadhi, G.A. El Khayat, B. Said, knowledge engineering for competence assessment on serious games based on semantic web, in IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) (2019)
Artistic Media Stylization and Identification Using Convolution Neural Networks Premanand Ghadekar, Shaunak Joshi, Yogini Kokate, and Harshada Kude
Abstract This paper explores the usage of different convolutional neural networkbased methods for artistic style transfer on images. Style transfer is an interesting domain of Computer Vision as it combines both aspects of style and content recognition in images. In this paper, methods are discussed to transfer mixed or unseen artistic styles to images in real time. Two approaches are implemented, which can achieve multiple and mixed style transfer, building on and inspired by Johnson’s fast style transfer algorithm. A custom baseline CNN architecture and a fine-tuned ResNet-18 are trained using transfer learning. The problem of artist identification of fine art paintings is also explored as this is a challenging problem handled by art historians who have vast experience and training and will use for digitizing large swathes of artwork. Keywords Style transfer · Super-resolution · Deep learning · Residual networks
1 Introduction Style transformation of one image to another is one of the major problems faced with respect to texture transfer of images as both content recognition along with the artistic style are combined. For image style transfer, the core idea is to preserve the target image’s semantic content and texture synthesis from base image while constraining the texture synthesis process. This problem was particularly hard as earlier texture information was extracted using handcrafted features and feature engineering, this required spending large amounts of time for extraction of these features, making image stylization unviable for large-scale uses. Although remarkable results are achieved from the above-mentioned methods, the fundamental weakness’s for P. Ghadekar · S. Joshi (B) · Y. Kokate · H. Kude Department of Information Technology, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] P. Ghadekar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_10
103
104
P. Ghadekar et al.
needing huge compute power, single style transfer, and usage of only low-level features in the stylization process hinder all of them. In this paper, style transfer algorithms proposed by Gatys [1] and Johnson [2] for fast single style transfer were explored. However, these approaches were also limiting as for each style, a new model instance must be trained. Taking inspiration from the aforementioned research and focusing on architectures given by Yanai [3] and Dumoulin [4] while exploring various effects and their outcomes such as color preservation, spatial feature transfer, and dilated CNNs. This paper proposes two baseline architectures built on top of Johnson’s work. As a further experiment, this paper also explores artist identification of paintings given no extra information. This task if solved could be useful for labeling any number of paintings to be digitized and categorized. It is a challenging aspect as artists can paint a variety of content with multiple styles which invariably can change over the lifetime of the artist.
1.1 Artistic Media Stylization Style transfer is a method wherein the content of an image and the style of an arbitrary image are combined to create a new image. This was particularly intriguing as it involves the creation of an AI algorithm that combines base images style and content such an artistic stylization of high quality is produced with respect to the content of the target image. This problem proved to be difficult to solve with conventional digital image processing and computer vision methods used to extract texture information while augmenting the image pixels to be stylized. However, this problem is tackled in a better manner with the usage of Convolutional Neural Networks. This paper explores and understands the work done by Gatys, in which CNN-based artistic stylization algorithms are proposed and based on this, their application in the Fast-Neural-Style algorithm therefore creating a novel cascading media stylization algorithm which, in real-time transfers the arbitrary images artistic style to the target image.
2 Literature Survey In 2015, a neural network-based approach was proposed by Gatys, which extracts neural representations to separate the content and style of arbitrary images, respectively and recombine them to create stylized images. The method proposed used CNN’s, which are one of the most generalizable image processing methods. CNNs create feature maps which are representations of the input image, which creates a hierarchical pattern as they train, the input image is processed and made into feature maps, which are sensitive to the semantic understanding than the pixel values. These layers of weights in neural networks are taken as a set of learnable parameters which are filters that extract higher level features from the pixel values of the input images.
Artistic Media Stylization and Identification …
105
The methods proposed by [1] had positive results such as the style and content could be propagated, it was highly computationally expensive. By using a neural net in a feed forward fashion, Johnson was able to speed up the process, this turned out to be a massive boon, as now the media could be stylized almost immediately. Furthermore, Johnsons work was improved upon by various methods in which the central tenet being the presence of a style choice or inputs given to the network, which were proposed by Huang et al. [5], Keiji[6] and Ghiasi et al. [7].
2.1 Artist Identification Traditionally, classifiers like SVMs and k-nearest neighbors coupled with image features like scale-invariant feature transforms (SIFT), a histogram of oriented gradients (HOG), and others were used along with to identify artists and style [8], 9. CNNs were used to extract features, and with these extracted features and finally, SVMs were used for classification purposes [10].
3 Problem Statement and Applicability In this paper, multiple models that generate multiple stylized images in a computationally light, yet quick manner are explored. However, the constraint here being that there should be exactly one instance of a network, which should be capable of generating any arbitrary style weights so as to generate a stylized image which is in stark contrast to most methods which use multiple individual instances wherein each is used for one style, thus making the network computation light and fast, reducing compute required. Microsoft COCO 2014 and the Wiki Art data set from WikiArt.org [11] were used. The main idea being the proposed algorithm will include the many different aspects of each painter’s work, including but not limited to the brush stroke thickness, light/dark palette usage and the overall color scheme of the artwork. When evaluating the speed and quality of the inference play a major role, albeit style transfer is a very subjective aspect and does need a human grader as art is very subjective and cannot be explicitly quantified.
4 Approach and Methodology 4.1 Deep Representation of Image By the method proposed by Gatysit was shown that based on an image input, deep feature maps could be made after computing the image through a neural network
106
P. Ghadekar et al.
and thus the style and the content of an image can be separated in such a way that it can be compared with other images. In this approach, a VGG-16 network was used which gave the feature maps based on its different activation layers [3]. The content is represented using the activations at various layers. Considering a layer with K filters will generate K Feature maps, each of size M and the total response being F l ∈ RNk ×Ml . When this layer is used to represent the content image, the difference between two images in the content being L lcontent (I1 , I2 ) =
(F1,i j − F2,i j )2
(1)
i, j
In a similar fashion, using the activation values of certain filters the style image can be represented. However, the different activations cannot be compared [1] thus the Gram matrix was proposed with representation with G l = F l F lT
(2)
Thus, Gl ∈ RNl×Nl . Considering any random layer, the style difference would be mean square error loss L lstyle (I1 , I2 ) =
(G 1,i j − G 1,i j )2
(3)
i, j
4.2 Fast Style Transfer In the approach proposed by Johnson et al. [2], the weights had to be continually updated through many iterations, through the network. This proved to be computationally expensive as it had to be done for each image and also it was observed that a feed forward network was used which transforms the content (original) image to the target (stylized) image (Fig. 1).
4.3 Conditional Instance Normalized From Dumoulin et al., it is learnt that if a new conditional instance normalization layer is introduced, then it could be capable of learning a set of parameters unique to each style. The core idea being performing an affine transformation with respect to target image features, conditioned by style features of the base image, is an adequate generic representation to achieve style transfer. This conditioning was to be performed after every instance normalization layer, which compared with batch normalization was
Artistic Media Stylization and Identification …
Dataset
107
Style Image Extracted Features
Style Image
+ Feature Maps
Stylized Image
Content Loss
Content Image
Fig. 1 Fast Style Transfer model
Fig. 2 Style normalization parameters [4]
proved to be a better alternative as stated in [12]. Mixed style transfer was implemented by giving g and b parameter weights as inputs at the time of inference and similarly, the spatial transfer can be achieved using the b, g weights if applied to certain regions of the image to achieve spatial transfer (Fig. 2).
4.4 Conditional Input Vector Yanai et al. proposed that multiple and mixed style transfer with a single feed-forward network is possible without introducing a CIN layer. The idea, inspired from [13], being a one-hot vector indicating which style (base) image is shown is provided during the model training phase, along with the base image, called the conditional input vector. This input vector is first duplicated and then concatenated with the output of the third\convolutional residue layer, before passing it through the newly introduced 1 × 1 Conv residual layer to do a combining operation. During the test phase, the input vector can be supplanted with weights for different individual styles such as to achieve mix style transfer (Fig. 3).
108
P. Ghadekar et al.
Fig. 3 Conditional vector input model
4.5 Using Dilated CNNs The core network, Image Transformation Net (ImageNet), is an encoder–decoder architecture. Hence, a relevant experiment that was tried was to swap the convolutional residue layers with dilated convolutions and then up sampling layers with transposed convolutions. Dilated convolutions would increase receptive field size while using fewer parameters, and with different stride parameters, down sampling can be achieved. It was found that the model performed at a speed of ×1.5 times faster compared with the Conditional Fast Style Network model, albeit resulting in a slight loss in output quality (Fig. 4). Decoder
Encoder
Content Image
Style
Image
Dilated CNN Layers
Residual Blocks
Transposed CNN Layers
Stylized Image
Affine Parameters
Fig. 4 This image transformation network or image net is based on residual blocks and convolutional layers. Once trained the weights of the ImageNet are extracted and in a single forward pass are used to convert the content images into stylized images
Artistic Media Stylization and Identification …
Image
Image
9x9 conv, 64
3x3 dilated conv, 64
3x3 conv, 32
Max Pool
Max Pool
109
3x3 conv, 32
Max Pool
3x3 conv, 32
3x3 conv, 16
3x3 dilated conv, 32
F C
3x3 conv, 16
F C
F C
F C
Fig. 5 Baseline architectures
5 Style Transfer Model and Artist Identification 5.1 Proposed Models Two CNN architectures were built from scratch and trained on WikiArt + MSCOCO [14] datasets. For both the architectures, SoftMax classifier with cross-entropy loss was used (Fig. 5) ⎞
⎛ ⎜ e L i = − log⎝
f yi
e fj
⎟ ⎠
(4)
j
5.2 Style Transfer Model The ResNet-18 architecture has been proved to work well for image recognition tasks. Hence, a pretrained ResNet-18 network architecture model was used as the backbone, which was initialized with weights, trained on the ImageNet dataset, trained for at minimum 30 epochs using the Adam Optimizer. With an increase in network depth, the gradients calculated in upper layers slowly vanish before reaching lower layers, resulting in accuracy saturation and plateauing. Residual blocks in ResNets are applied so as the upstream gradients are evenly propagated to lower layers. Resulting in the network accuracy to be upped. SoftMax classifier with cross-entropy loss was used for ResNet-18. Various methods of transfer learning were tried such as (1) Retraining the entire network, (2) Application of Transfer learning by replacing only the final fully connected layer and training weights for it respectively.
110
P. Ghadekar et al.
Fig. 6 Comparing models: I. Style (left), II. Conditional Input Vector (middle), III. Conditional Instance Normalization (right)
Fig. 7 Comparing models: I. Style (left), II. Conditional Input Vector (middle), III. Conditional Instance Normalization (right)
6 Results 6.1 Single Style See Figs. 6 and 7.
6.2 Mixed Style See Figs. 8, 9 and 10.
6.3 Dilated CNN’s See Fig. 11.
Artistic Media Stylization and Identification …
Fig. 8 I. Single art, II. Single Art, III. Mixed stylization result
Fig. 9 I. Single art, II. Single Art, III. Mixed stylization result
Fig. 10 I. Single art, II. Single art, III. Mixed stylization result
7 Experimental Metrics 7.1 Result Metrics See Tables 1 and 2.
111
112
P. Ghadekar et al.
Fig. 11 I Style, II. Dilated CNN, III. Normal stylization
Table 1 Training time calculations
Table 2 Model accuracy (Baseline vs ResNet)
Training time (s) for 1000 iterations Condition Input Vector
Conditional IN
Dilated CNNs
6827.73
12,149.36
5294.6
Baseline CNN Hyperparameters
Val accuracy
lr = 0.1, λ = 0.01
15
lr = e−2 , λ = 0.01
22
lr =
e−3 ,
λ = 0.01
35
lr = e−4 , λ = 0.001
48
lr = e−5 , λ = 0.0001
25
Fine-tuned ResNet lr = e−4 , λ = e−4
Train accuracy = 93
lr = e−5 , λ = e−4
Val accuracy = 76
Artistic Media Stylization and Identification …
113
7.2 Artistic Identification of Stylized Images It was seen that the most commonly predicted artists were Escher, Maurice and Erte. An instance was trained using a painting by artist Boris and it was seen that Escher was the artist predicted. Boris’ work was very similar in style to most of Escher’s art. Since a style model is limited by the number of purely unique art, style models can only be trained with limited styles, it is difficult to do a proper analysis of the effect of stylized images. Thus, this is one area where the proposed models to fall short (Figs. 12 and 13).
8 Conclusion In this paper, multiple and mixed style transfer was implemented. Dilated CNNs were explored to observe their effect on image stylization. A consensus was reached that image quality is sensitive to the weights since during the tuning of weights in the loss layers different results were observed. Mixed style transfer with conditional
114
P. Ghadekar et al.
Fig. 12 Boris and Escher artwork
Fig. 13 Artist wise accuracy
vectors was also explored with seven style targets. The multiple unique styles were achieved by manipulation of the conditional vector. Finally, the technique of arbitrary style transfer was built upon by the replacement of the Inception-Net by a custom lightweight trainable network architecture and proved by using a smaller network for arbitrary style transfer.
References 1. L.A. Gatys, A.S. Ecker, M. Bethge, Image style transfer using convolutional neural networks, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), pp. 2414–2423 2. J. Johnson, A. Alahi, L. Fei-Fei, Perceptual losses for real-time style transfer and superresolution, in European Conference on Computer Vision (Springer, 2016), pp. 694–711 3. K. Yanai, R. Tanno, Conditional fast style transfer network, in Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval (ACM, 2017), pp. 434–437
Artistic Media Stylization and Identification …
115
4. V. Dumoulin, J. Shlens, M. Kudlur, A learned representation for artistic style. CoRR, abs/1610.07629 2(4), 5 (2016) 5. X. Huang, S. Belongie, Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. arXiv: 1703.06868 (2017) 6. K. Yanai, Unseen style transfer based on a conditional fast style transfer network (2017) 7. G. Ghiasi et al., Exploring the structure of a real-time, arbitrary neural artistic stylization network. arXiv: 1705.06830 (2017) 8. J.D. Jou, S. Agrawal, Artist Identification for Renaissance Paintings (2011) 9. T.E. Lombardi, M.J. Lang, A.E. Campbell, The Classification of Style in Fine-Art Painting (2005) 10. Y. Bar, N. Levy, L. Wolf, Classification of artistic styles using binarized features derived from a deep neural network, in ECCV Workshops (2014) 11. Kagglewikiart Dataset. https://www.kaggle.com/c/painterby-numbers 12. D. Ulyanov, A. Vedaldi, V.S. Lempitsky, Instance normalization: the missing ingredient for fast stylization. CoRR, abs/1607.08022 (2016) 13. S. Iizuka, E. Simo-Serra, H. Ishikawa, Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans. Graph. (TOG) 35(4), 110 (2016) 14. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C.L. Zitnick, Microsoft coco: common objects in context, in European Conference on Computer Vision (Springer, 2014), pp. 740–755
Performance Analysis of Different Models for Twitter Sentiment Deepali J. Joshi, Tasmiya Kankurti, Akshada Padalkar, Rutvik Deshmukh, Shailesh Kadam, and Tanay Vartak
Abstract Sentiment analysis is the classification of emotions (such as positive negative and neutral) in the textual data. It helps companies to know the sentiments of customers regarding their products and services that they provide. Today many customers express their reviews on particular product on social media sites. If we analyze those reviews using sentiment analysis, we could know whether customers are happy or not with certain products. Twitter represents the largest and most dynamic datasets for data mining and sentiment analysis. Therefore, Twitter Sentiment Analysis plays an important role in the research area with significant applications in industry and academics. The purpose of this paper is to provide an optimal algorithm for Twitter sentiment analysis by comparing the accuracy of various machine learning models. In this context, nine well-known learning-based classifiers have been evaluated based on confusion matrices. Keywords Twitter sentimental analysis · Model comparison · Performance analysis
1 Introduction Sentiment analysis is also known as Opinion Mining, plays an important role in knowing the opinion of peoples. Sentiment analysis model focuses on the polarity (positive, negative, neutral) and also the emotions and feelings. It is one of the applications of natural language processing, which is a subpart of AI. Sentiment analysis uses natural language processing techniques and algorithms. Different classification models are used for this purpose. Using sentiments, an organization can easily retrieve the feelings of customers about their product through tweets. As the world is connected socially, social media is an excellent source to gather sentiments of people’s overall world. There are many social sites such as Facebook, Whatsapp, Instagram, and Twitter [1]. We are considering twitter in our project as it is a new D. J. Joshi (B) · T. Kankurti · A. Padalkar · R. Deshmukh · S. Kadam · T. Vartak Vishwakarma Institute of Technology, Upper Indira Nagar, Bibwewadi, Pune, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_11
117
118
D. J. Joshi et al.
trending social media site, which has above 320 million users which are active daily and approximately there are 500 million tweets every day. Tweets are short with a limit of 140 characters, which involves daily activities of users, different views on current objects such as politics, entertainment, lifestyle, technologies, and business, and so on. Using twitter, we can get a large amount of dataset and when we can analyze correctly then we can easily predict any aspect of the business module. Twitter sentiment analysis is generally performed by many developers and researchers in their project using any one machine learning model. We tried something new here by comparing the accuracy of different machine learning models for Twitter sentiment analysis. For this, we used Logistic Regression, Linear SVC, Multinomial NB, Bernoulli NB, Ridge Classifier, Ada-boost, Perceptron, PassiveAggressive Classifier, and Nearest Centroid models. Based on their accuracy and confusion matrices, we concluded which one is an appropriate model for analysis. Along with this we also have used different types of vectorization such as count vectorizer and TFIDF (Term Frequency-Inverse Document Frequency) n-grams such as (unigram bigram and trigram) to check which gives us the best results and which combination is best for Twitter sentiment analysis. In this project, we tried to find out which model is best for analysis rather than developing it. So, it is a research paper. The brief explanation of every model of the research is presented in the next section.
2 Dataset Our aim is to build different machine learning models to make analysis about which one is best in classifying our tweets as either positive or else negative and for that purpose, we need a huge amount of data [2]. With the help of this, our model will have decent performance so we decided to choose the sentiment 140 dataset, which comes from Stanford University, this dataset is open source available on the internet to download [1]. This dataset was created by computer science students who graduated from Stanford University. Now let us first look at the dataset after importing the dataset using the Pandas function of read_csv, we found that there are 1.6 million tweets in the data set. The dataset does not have any null values and the most important part is that all the tweets are marked as positive and negative [3]. From the analysis, we found that 50% of the data are labeled negative and 50% of the data are labeled as positive. The dataset also contains the column such as ID, date, query_string, the user so as these columns are not required in our analysis we need to remove these columns so with the help of drop function we removed this column and we kept only the sentiment and the text column in the data set. After the analysis, we found the fact that the negative class and positive class are not mixed up in this data set they are both separate from the index 0–799,999 we have a negative class and from 800,000 onwards we have a positive class.
Performance Analysis of Different Models for Twitter Sentiment
119
3 Data Preparation We know that the dataset is not always clean so we need to do the cleaning of the data to get the best accuracy of our to model when we pass it to our model so for that purpose here we are conducting the data cleaning process. In this process first, we need to check and plot some graphs here [4]. We have got a boxplot graph of the length of each tweet and we understood that the Twitter character limit is 140 but some tweets are larger than 140 characters in length so it means that we need to clean our data. Our dataset is made up of tweets and it contains a lot of HTML encoding, which has been not converted to text so we need to decode the HTML to general text for cleaning it, and here for that purpose we are using beautiful soup library. Next, we have to deal with the “@.” It tells which user has posted that tweet but we do not need it to build our sentiment analysis model hence we will drop it. After removing the HTML tags and the “@” now it is time to clean the URL links because URL links also carry information but for sentiment analysis purposes we do not need them. A lot of users use hash-tags in the tweets it might be a not good idea to remove all the text that is with the hash-tag so whatever the text that comes after the hash-tag we have retained that text and we have just removed the “#.” Now to perform this task, we have had defined a function that will perform all this task on the whole data set with that it will also perform the tokenization, stemming and stop words all this will be removed. After cleaning the tweets successfully, we will save them in a different CSV file.
4 Data Splitting In machine learning, we know that before giving our data for training to train any model, we have to split it into training data and testing data. In this case, we decided to split our data into three parts first train, second the development, and the third is the test. We have referred the course on Coursera of Andrew Ng’s. The train set will contain the sample data, which will be used for learning [5]. The development set is the sample of data that will be used to tune the classifier parameters and provide the evaluation of the model in an unbiased way. A test set is the sample of data that is used to test the performance of the final model. The ratio in which we will divide our data will be 98% training, 1% for the development set, 1% for the testing set. We have chosen this ratio because our dataset contains 1.6 million to entry so if we consider such a large amount of data the 1% of that data will give us more than 15,000 entries and that amount is more than enough to evaluate the performance of our models.
120
D. J. Joshi et al.
5 Feature Extraction Feature extraction is nothing but the process of reducing the data set of large size. A large data set may require high computational power and lots of efficiency of the hardware so sometimes we need to reduce the number of resources that will be required for processing our information. Feature extraction is the method, which will only use the important and required features for analysis [6]. Bag of words is the practical way of feature extraction in natural language processing. It extracts the words or the features, which are used in the tweets and it will classify them by the frequency of which they are used, it is also popularly used in image processing.
5.1 N-grams 5.1.1
I
Unigram
am
5.1.2
I am
5.1.3
I am so
so
happy
and
my
life
is
so
grateful
is so
so grateful
Bigram
am so
so happy
happy and
and my
my life
life is
Trigram
am so happy
so happy and
happy and my
and my life
my life is
life is so
is so grateful
5.2 Count Vectorizer We cannot use the parsed data directly as an input to the machine learning model for prediction [6]. We need to remove the words called tokens then these words are encoded in integers or floating-point values so that they can be used to give as an input to machine learning model. This process is known as feature extraction. In
Performance Analysis of Different Models for Twitter Sentiment
121
machine learning, we cannot directly use the text data for feeding it to the algorithm to do the classification. First, we convert the text to either integer or floating-point numbers and then we form the vectors of these numbers and pass them as an input to the algorithm that is why we need to convert the document to the vectors which are of fixed length numbers. The simplest way to do this work is the bag of word model, it converts all the information in the words and it mainly focuses on the occurrence of that word in the vocabulary this can be done by assigning each word a unique number the value in the vector can be filled with the frequency of each word in the document. Count vectorizer provides this simple way to do all of the stuff that we discussed above it tokenizes the collection of words and also it builds the vocabulary of the frequency words the vectors created by count vectorizer contains a lot of zero and in python, we called them as the sparse matrix of pass vectors so we have to transform them back to the NumPy array. The appearance of the word in the text is counted by the count vectorizer.vector addition can be performed with another way which is TFIDF, we will also look at it and compare the count vectorizer performance with the TFIDF vectorizer performance using the logistic regression model in the next coming part.
5.3 TFIDF Vectorizer TFIDF is another way of vectorization, which is also used to convert the textual data into numbers. The TF in TFIDF stands for term frequency and IDF stands for inverse document frequency full stop this technique is widely used across various natural language processing based projects to extract the features [7]. Term frequency tells us how many times the word had appeared in our single document. But there are many words for example “the” which occurs in our document a lot of times so the occurrence of their frequency is also high but these words are still not meaningful for our sentiment analysis they do not help to classify the tweets as positive or negative. To solve the above problem, we use the inverse document frequency [7]. It is the log of the number of document divided by the number of the document that contains the word. TF(t, d) =
number of time term(t)appears in document(d) total number of terms in document(d)
(1)
But let us not go deep into it, for now, it just avoids the unnecessarily repeated words to be taken as important words for classification just if the term appears again and again in various documents then it will market score as zero and that means that disturb is not informative at all for differentiating the documents all other things of TFIDF are same as countvectorizer.in the next part will find the performance of logistic regression by using both count vectorizer and TFIDF and we will provide the result that which one will be the best for classification.
122
D. J. Joshi et al.
6 Model Comparison 6.1 Logistic Regression We are moving toward the actual implementation of our first model that is the logistic regression for the classification of our tweets as positive and negative. Logistic regression is a supervised machine learning model. Here the dependent variable is dichotomous, which means the outcome is only two possible classes which are in our case either positive (1) or negative (0). Logistic regression model will use sigmoid function with some parameters which will map input features to the output labels. To get optimum mapping, cost function will be calculated. h x (i) , θ =
1 1 + e−θ T x (i)
(2)
The outcome of this function is the probability that is between 0 and 1. Classification is done based on the threshold value, which is 0.5 here. θ T x (i) ≥ 0
(3)
If value is less than 0.5 and closer to zero then tweet is negative. θ T x (i) < 0
(4)
If value is greater than 0.5 and closer to 1 then tweet will be predicted as positive.
6.1.1
Using Count Vectorizer
See Table 1.
6.1.2
Using TFIDF Vectorizer
We had two choices of the Vectorizer method so as for Logistic Regression we have considered both of them to find which one shows the more accurate result. Also in Table 1 Comparison of N-grams using count vectorizer
N-gram
Accuracy (%)
Unigram
79.96
Bigram
82.22
Trigram
82.02
Performance Analysis of Different Models for Twitter Sentiment Table 2 Comparison of N-grams using TFIDF vectorizer
N-gram
Accuracy (%)
Unigram
80.27
Bigram
82.56
Trigram
82.82
123
Fig. 1 N-gram comparison using count vectorizer and TFIDF vectorizer
both the vectorizer method Unigram, Bigram and Trigram are considered. So based on the accuracy, we can consider only one N-gram for further models (Table 2). After plotting the N-grams of both the Vectorizer method, we can see that Trigram TFIDF vectorizer stand above all and showed better accuracy than others. By considering this fact, we have applied Trigram TFIDF vectorizer for the remaining models (Fig. 1).
7 Performance Analysis After performing all the nine models and considering only TFIDF with Trigram, logistic regression shows better results compared with other models (Table 3). Also, Adaboost showed very poor performance in terms of Accuracy as well as Train and Test time.
124 Table 3 Accuracy and train test calculated after performing all models
D. J. Joshi et al. Model
Accuracy (%)
Train and test time (s)
Logistic regression
82.82
112.93
Multinomial NB
80.21
113.90
Bernoulli NB
79.02
114.31
Linear SVC
82.16
1156.46
Ridge classifier
82.23
149.97
Adaboost
70.54
499.82
Perceptron
77.24
115.99
Passive aggressive classifier
79.97
123.82
Nearest centroid
73.18
114.95
Here is the matplotlib graph that shows the accuracy of each model for per 10,000 features (Fig. 2).
Fig. 2 Comparison graph of all models for 100,000 features
Performance Analysis of Different Models for Twitter Sentiment
125
8 Conclusion In this paper, we have tried to make analysis of the different models, which can be used for Twitter sentiment analysis. By looking at the analysis and performance of each and every model with the help of confusion matrix and classification reports and building charts using matplotlib library function, we found that the accuracy of logistic regression is best. Also we have tried to perform feature extraction using count vectorizer and TFidf both and we found that tfidf vectoriser is more better with logistic regression as compared with count vectorizer. We also found that in each and every case from unigram to trigram the has shown good results as compared with count vectoriser. Amazing result that we got in Logistic regression was with the help of tfidf vectorizer using 100,000 features. Therefore, we decided to use TFidf vectoriser with 100,000 features including them up to trigram to use with every other models. By comparing the classification reports of each and every model, we found that logistic regression worked more better than all of the used algorithm. This may be because the logistic regression is usually very helpful in case of very large data set. So by studying the detailed analysis of our Twitter sentiment analysis, we found many useful vectorization techniques and other aspects that are considered in machine learning for binary classification, also we came to know the working of each and every model in detail and how the model classifies data. Working with natural language processing was a great experience for us. In this, we also learned how to handle the textual data before feeding it to the machine learning model and overall this study and research-based project was a great experience to all of our team members.
References 1. A. Attarwala, S. Dimitrov, A. Obeidi How efficient is Twitter: predicting 2012 U.S. Presidential elections using support vector machine via Twitter and comparing against Iowa electronic markets 2. H.S. Kisan, H.A. Kisan, A.P. Suresh, Collective intelligence and sentimental analysis of twitter data by using Standford NLP libraries with software as a service (SaaS) 3. G. Kavitha, B. Saveen, N. Imtiaz, Discovering public opinions by performing sentimental analysis on real time Twitter data 4. M.F. Çeliktu˘g, Twitter sentiment analysis, 3-way classification: positive, negative or neutral? 5. https://towardsdatascience.com/another-twitter-sentiment-analysis-bb5b01ebad90. Last Accessed: 29-04-2020 6. https://www.geeksforgeeks.org/feature-extraction-techniques-nlp/. Last Accessed: 12-04-2020 7. X. Wang, J. Gu, R. Yang, Text clustering based on the improved TFIDF by the iterative algorithm 8. F. Zhu, X. Yang, J. Gu, R. Yang, A new method for people-counting based on support vector machine
Electricity Forecasting Using Machine Learning: A Review Shital Pawar, Prajakta Mole, Shweta Phadtare, Dhanashri Aghor, and Pranali Vadtile
Abstract In today’s revolutionary period, electricity is one of the crucial aspects among the industry as well as day to day life of people so efficient use of electricity is very much necessary. To optimize the consumption of electricity, there is a need for electricity forecasting. Making a cost-efficient electricity management system is a true intention behind the prediction of electricity. Various approaches can be implemented like machine learning, deep learning algorithms for forecasting of electricity. Machine learning algorithms are proved very much efficient in different applications related to forecasting. There are various researches in load forecasting using different methods. Analyzed different methods of different authors to get better results in terms of accuracy; cost-efficiency and robustness. Various parameters are considered to achieve higher accuracy like temperature, environmental conditions. To study the performances of different machine learning and deep learning algorithms several types of errors are calculated such as RMSE (root mean square error), MAE (mean absolute error), MAPE (mean absolute percentage error). These errors define the accuracy of particular algorithm. Keywords Load forecasting · Machine learning · Power consumption
1 Introduction The prediction of electricity plays a vital role in power system control and operation. On the scale of utilization, load predicting is very necessary to maintain the balance between demand and supply and making l systems cost-effective. This type of tool is useful for controlling commercial building loads, reduces peak demand, and alleviate greenhouse emission gas [1]. Most of the researcher’s study focuses on short duration and very short duration load forecasting system consist of models, which are using accurately forecasted variables like weather conditions and historical time data lag S. Pawar · P. Mole (B) · S. Phadtare · D. Aghor · P. Vadtile Electronics and Telecommunication Department, Vishwakarma Institute of Technology, Pune, Maharashtra, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_12
127
128
S. Pawar et al.
values. Large-scale systems belong to industries that require probabilistic modeling while medium-term load forecasting systems require keen observation considering its vital operations [2]. There are a lot of areas that are still facing load cutting issues due to improper ratio of demand and supply those kinds of problems will get avoided by this system. Forecasting operation provides consumers with the possibility of comparing current utilization behaviors with future expenses. Therefore, consumers may find this thing helpful and from predicting load energy values with a greater understanding of their power consumption ratios and their future estimations, which allows to do efficient management of their utilization will affect cost reduction. By making power consumption and their further estimates, more clear, it would be very simple to know how much energy actually is getting used and how it would impact our plans in the future [3]. The electricity load forecasting system can be developed by considering various parameters like temperature, humidity that is environmental conditions, and type of people who lives in that particular area. Researchers have done plenty of work on the forecasting of electricity demand. Some studied the use of machine learning algorithms like regression; clustering and some of them discovered methods like sparse coding for making models and predicting household electricity loads for each individual. Ensemble method made up of the Empirical Mode Decomposition (EMD) technique and deep learning method which is also used while solving this problem. Another way to model the very short-term energy of individual households depends upon information and daily routine pattern inspection. A model which is made for control with regression trees algorithm, which allows model to perform closed-loop control for demand response (DR) scheme synthesis for huge commercial buildings. By functional data analysis of generalized curves which gives probabilistic prediction of electricity load [4]. Due to the nonlinear and uncertain behavior of the factors which are affected by the electric load growth as well as the uncertain of demand behavior makes forecasting much difficult. In addition to the difficulty due to nonlinear and uncertain demand patterns, there is a huge difficulty in data collection and recording. There are several conventional methods used in load forecasting starting from forecast future load growth based only on past load growth with time, to a new generation of intelligent computing techniques used in simulation [5]. In these algorithms doing some statistical data operations like aggregating power consumption, calculating RMSE (Root Mean Square Error) will evaluate the load forecasting. There are different horizons like hourly, daily, weekly, monthly, and annual values for both total demand and peak demand with forecasting can be done with different patterns. Short-term forecasting has fascinated a great deal of attention within the surveys because of its number of applications in power system management. The power system includes taking steps for load balancing and planning of required resources. For system planning, investment, and budget allocation other forecasting insights, for instance within the medium- and long-term, are required. The fact is that the behavior of consumers varies and each customer has their consumption patterns [6].
Electricity Forecasting Using Machine Learning: A Review
129
This paper reviews various studies of researchers, which includes the single scale for grids with real-time pricing. Forecasting ensures using energy storage systems to decrease the cost of energy for the consumers. Most of the studies performed that real-time series data handled with algorithms like LSTM Long–short term memory networks), which gives better accuracy in terms of prediction. To determine when should we use battery storage systems in replacement of existing systems by forecasting need for electricity one more added feature of the system. The main intention behind the project is to perform prediction on the electricity consumption of a single home system, which has multiple antennas at both transmitter and receiver sides.
2 Electricity Load Forecasting As electricity have different characteristics than usual products. Electricity is cannot be stored anywhere. We need to perform some complex actions to transform it and store it in other forms. In the business background, it is better to have an estimation of electricity consumption for making budgets or to meet demand and supply ratio. Also in households, it is better to reduce the utilization for cost-effective management. By using the following stepwise procedure, we can build forecasting systems. Electricity load forecasting can be done with the help of historical data. Historical data comprise of duration over which power consumption is observed. Past data are used to predict future consumption of energy so anyone can plan their budget. These data get analyzed with some parameters, which are described below. These data classified into three categories, which are as follows: 1. 2. 3. 4.
Very short-term forecasting: contains few minutes or hourly analysis Short-term forecasting: daily analysis Medium-term forecasting: monthly analysis Long-term forecasting: yearly analysis.
After considering these terms, some factors also impact the prediction like weather conditions, humidity, temperature, social change, holidays, etc. These factors also play an important role while estimating results. Getting all these data, we perform some data analysis like cleaning data, treating outliers. Then to get proper data, we use feature extraction techniques, for example, MRMR (minimum redundancy and maximum relevancy) by using these kinds of methods, features selection can be done. The next step while our data are ready is building a model using computational techniques, i.e., algorithms. Performing some statistical functions on data and applying no algorithms like SVR (support vector regression), ANN (artificial neural network) we built a model to predict future values. The major step after that is to perform an error analysis. The dataset gets divided into two parts training and testing, so train error and test error get calculated. These errors show how much percent of accuracy is achieved, if the error is high in the training set as well as in test set, then it is indication of high bias. While if the training set is good but the test set is bad, it indicates high variance. Bias term shows
130
S. Pawar et al. Model distinguish and parameter estimation
Collect historical load data
Analysis and selection of load data
YES
Error analysis NO
Pre-treatment of load data
Improve forecasting model
Create load forecasting model
Output value of load
Fig. 1 Flowchart of load forecasting
that the output is not efficient for data where as variance defines that the output is good for some part of data. Depending upon the factors of training and test error model again get improved or we got the final results (Fig. 1).
3 Literature Survey The authors Zhang, Grolinger, Capretz [5] investigate 15 anonymous individual household’s electricity consumption forecasting using SVR modeling approach, applied to both daily and hourly data granularity. The electricity used dataset was taken from 15 households by London Hydro company, from 2014 to 2016. EDA is adopted for data visualization and feature extraction. It shows that forecasting residential electricity consumption by weather, calendar and Time of usage price is feasible and reliable with sufficient accuracy for some individual residential uses in either daily or hourly prediction. Ademola-Idowu, Kupsc, Mollinger home electricity prediction [6] have presented an model for single house power consumption. In this paper, three different techniques are applied: linear regression, locally weighted linear regression and support vector regression. The errors found were due to lack of data, which was not properly organized. In future, some issues of single house will be solved by taking average out of variability. Further in Aly [7], the medium-term load forecasting is proposed such that it is an initiative first step of power system planning as well as provides an energy-efficient operation. The ANN proposed method takes into consideration the effect of temperature, time, population growth rate and therefore the activities of different regions of city areas. This technique is applicable for several load types and able to deal with multifactors affecting on the forecasting process.
Electricity Forecasting Using Machine Learning: A Review
131
Ali, Mansoor, Khan, [8] proposed a method, Forecasting using Matrix Factorization for short-term load forecasting at individual household level. The activity patterns are not used by FMF so therefore, it is applied to readily available data. Thus main focus of this paper is to get high accuracy at residential level. Further Zare-Noghabi, Shabanzadeh [5], during the study, the different algorithms are used for medium-term load forecasting. Support vector regression and symbiotic organism search optimization method are applied for medium-term load forecasting. In this paper, to avoid irrelevant and extra features, the minimum redundancy maximum relevancy (MRMR) feature extraction method is used. The feature extraction methods have many advantages such as it helps in reducing dimension of major issue which can be easily solved. The author mentioned comparing different features, minimum redundancy can be optimized in future. Yildiz, Bilbao, Dore [1] have been evaluated a approach for houseload loads. The model consists of integrated data of single house consumption, which is extracted by some conventional and classified methods. The approach used is referred as smart meter-based model (SMBM), which gives information about load profiles and some important key factors of electricity consumption and also gives relationship of forecasted loads. The another approach used is CCF that is cluster classify forecast where clustering and classification are applied for individual house. The result obtained shows that CCF method performs better than SMBM method in load forecasting of individual household. In Zheng et al. [4], published that, using Bayesian neural network for electricity forecasting has a problem that is it is dependent on time and weather factors, so to solve this problem, it uses an model that takes historical load data as an input based on feed forward structure. The output of using such models when compared shows that BNN reduce calculating time by more than 30 times with better accuracy. Liao, Pan [9] have proposed an approach to verify the validity of the model and compared it with ANN model. To enhance the prediction, an method is used which improves the efficiency. The reliability of model is checked by identifying different inputs at different degrees, thus the results show the applied model is robust.
4 Result Literature review
MAPE (mean absolute percentage of error)
RMSE (root mean square error)
MAE (mean absolute error
Time duration
Output
(continued)
132
S. Pawar et al.
(continued) Literature review
MAPE (mean absolute percentage of error)
[1]
RMSE (root mean square error)
MAE (mean absolute error
Time duration
Output
Daily: 12.78 Hourly: 23.31 & 22.01
Per hour
It gives better prediction for the household, which are added as a feature
Test error for linear regression: 16.1
Per hour
Better accuracy is obtained early morning hours
[3]
–
Composed method: 3.4923 ANN method: 1.0445
Per month
For the nonlinear issues, the ANN techniques are more useful for load forecasting
[4]
50
MLR: 0.561 SVM: 0.531
Half yearly
At the household level, high accuracy is obtained. As the hours are aggregated, the error decreases
[5]
13.9
Per day
The nonrelevant and add on features are avoided with less redundancy with help of feature extraction
Per hour
SMBM is much more easier method than CCF
[6]
Particular household: 19% SMBM: 26%
Particular household: 38% SMBM: 52%
(continued)
Electricity Forecasting Using Machine Learning: A Review
133
(continued) Literature review
MAPE (mean absolute percentage of error)
RMSE (root mean square error)
MAE (mean absolute error
Time duration
Output
[7]
BNN: training set (60%): 1 month–9.2%
BNN: training set (60%): 1 month–6.15e−2
Linear: 0.20
Per week BNN method has better accuracy then other proposed models
[8]
0.98
Integration model: 0.16
Per hour
The dense average connection has good convergence than compared to connected layer model
5 Conclusion In this paper, we went through various approaches to making models to predict the load with the help of available realistic data of power consumption and some variables like humidity, temperature. In most of the models, time-series data are taken on which LSTM algorithm is giving good accuracy. In a timely manner, short-term predictions give more accurate results in comparison with medium and long-term predictions. By observing the results of various models, deep learning models like ANN are giving efficient results. Time-series data providing granularity in the results. In one of the proposed ideas of selection criteria of variables MRMR (minimum redundancy maximum relevancy) which showing useful results and nonrelevant features get removed by this method. For household prediction SMBM (smart meter based model) is the most widely used technique and it is quite simpler. On the other hand, CCF (cluster, classification and forecast) is more complex. Advanced machine learning models give a better performance in terms of regression analysis. But in order to achieve more accuracy, authors are using multilayer perceptron models of deep learning is used. Classifiers like RFR and XGB have less complexity. For deployment of the model, accuracy should be improved and simplicity should be kept at its best value.
134
S. Pawar et al.
References 1. B. Yildiz, J.I. Bilbao, J. Dore, Household electricity load forecasting using historical smart meter data with clustering and classification techniques, vol 15(4) (School of Photovoltaics and Renewable Energy Engineering, University of New South Wales, August 2013) 2. A. Zare-Noghabi, M. Shabanzadeh, Medium-term load forecasting using support vector regression, feature selection, and symbiotic organism search optimization, vol 22 (Department of Power System Operation and Planning, Niroo Research Institute (NRI) Tehran, Iran, 2015) pp. 147–156 3. Y. Hsiao, Household electricity demand forecast based on context information and user daily schedule analysis from meter data. IEEE Trans. Mach. Learn. 12(5) (May 2013) 4. S. Zheng, Q. Zhong, L. Peng, X. Chai, H.C. Chang, A simple method of residential electricity load forecasting by improved bayesian neural networks, (School of Urban Railway Transportation, Shanghai University of Engineering Science, China, 13 September 2018) 5. X.M. Zhang, K. Grolinger, M.A.M. Capretz, Forecasting residential energy consumption using support vector regressions. electricity and computer engineering, (Western University London, Ontario, Canada, N6A 5B9, June 2013) 6. A. Ademola-Idowu, P. Kupsc, S. Mollinger, in Home Electricity Forecasting, (December 11, 2014) 7. G.E.M. Aly, Medium-term electric load forecasting based on artificial neural network, Department of Electric Power & Machines Faculty of Eng. Arab Academy Tanta University, IEEE Transaction 2005 8. S. Ali, H. Mansoor, I. Khan, Hour-ahead load forecasting using AMI data arXiv: 1912.12479v2 [eess.SP] (8 Jan 2020) 9. Z. Liao, H. Pan, Short-term load forecasting with dense average network. in Proceedings of 12th IRF International Conference, (Pune, India, 29th June 2014). ISBN: 978-93-84209-31-5
End to End Learning Human Pose Detection Using Convolutional Neural Networks Nilesh D. Navghare and L. Mary Gladence
Abstract There have been many uses of human pose detection in recent days. In this paper, we propose a new algorithm for estimating human pose in real time by using pair of convolutional neural networks, along with a restrictor, trying to predict the human pose in next sequence, and 3D pose generator, from an image sequence, either fed in real time or prerecorded. This algorithm also focuses on trying to implement this in a lightweight pipeline, which can be used in lightweight systems such as a mobile platform. Keywords Motion capture · CNN · Neural network · 3D animation
1 Introduction AI has multiple use cases in multiple industries scattered throughout the domains. Recent works in AI development resonate its application in 3D animation industries used in-game technologies, and even in modern CGI-based movies. This eliminates the problem of relying on the traditional and expensive motion-capture suits along with specialized hardware and skill sets. The modern advancements in this field have also interested in sign language detection, robotics, and even body language analysis. Recently, deep learning set up its roots and its competencies in numerous computer vision activities namely estimation of human pose in 3D, 3D Animation and Motion capture [1, 2]. The modern research in this field classifies the problem of motion tracking into two parts: Estimating pose of a human figure in a given frame and 3D reproducing the figure to closely resemble the ground truth from a single two-dimensional image. We propose to solve the problem by having two Convolutional Neural Networks each N. D. Navghare (B) · L. M. Gladence School of Computing, Sathyabama Institute of Science and Technology, Jeppiaar Nagar, Rajiv Gandhi Salai, Chennai 600119, India e-mail: [email protected] L. M. Gladence e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_13
135
136
N. D. Navghare and L. M. Gladence
trained on different datasets to predict the pose and feed it to a restrictor which is a pure neural network estimating pose of the frame with input of the two previous frames’ poses, the pose by CNN (Convolutional Neural Network) closely assembling the prediction will be chosen for 3D reproducing the joints. We plan to use a supervised 3D pose estimator [3] for the problem stated training on all data sets on which CNN and restrictor were trained. With the help of these components, a 3D armature of the frame [4] is generated and linearly interpolated based on the previous frames to gain a smooth motion capture sequence. We propose to run this algorithm on lightweight hardware, thus making it more accessible.
2 Literature Review Current research addresses the challenges of 3D human pose estimation. Research in early stages on 3D monocular, pose estimation from videos [5] consisted of frame-toframe pose tracking with dynamic models that rely on Markov dependencies among previous frames. The principal cons of these tactics are the requirements of the preliminary pose and unable recovery from tracking failure. To reduce these hurdles, more efficient techniques are used to detect the poses of humans in every single frame, and a consecutive step tries to set up temporally steady poses. Yasin et al. [6] have converted the estimation into an improved 3D pose hassle. A very big drawback of this method is its high amount of time consumption. To process the single image, the time required is more than 20 s. Sanzari et al. [7] have implemented one model called nonparametric model, which represents the typical motion of human skeleton conjunct groups and consistent nature of the linked group poses is considered for the process of pose reformation. Li and Chan [8] first have introduced how CNNs can be used to revert the 3D human posture from monocular images and two training approaches to enhance the network. Zhou et al. [9] have implemented a 3D posture estimation framework from videos, which contains a novel method for synthesis between a deep-learning-based 2D part detector, the sparsity-driven 3D reformation technique and 3D partial smoothness prior. Zhou et al. proposed a method, which directly integrating a kinematic type of object model into the deep learning network. Du et al. have introduced supplementary built-in knowledge for rebuilding the 2D posture and invented a new objective technique to predict 3D posture from the captured 2D posture. More lately, Zhou et al. [9] offered a prediction scheme to launch 3D human posture estimation as a 3D key-point localization issue in a 3D extent in a back-to-back fashion. Moreno-Noguer et al. have formulated the 3D human posture estimation issue as a repletion among matrices encoded 2D and 3D combined distances. Tekin et al. had given a perspective of exploiting the information of motion from continuous frames and enforced a deep learning network to revert the 3D pose [10]. Tome et al. have introduced a multifunction substructure mutually linked to 2D joint
End to End Learning Human Pose Detection …
137
assessment and 3D pose reformation to enhance each task, to hold the reflection of extensive 2D posture datasets [10]. Chen et al. have implemented a simple approach for 3D human posture estimation by accomplishing 2D posture estimation along with corresponding 3D example matching [11]. Zhou et al. have suggested one method called a weakly supervised transformation learning that makes use of 2D as well as 3D identification in a combined intensive two-stage cascaded interconnected structure. But, those techniques oversimplify 3D geometric information. In evaluation to these kinds of techniques, our prototype can leverage a lightweight network structure to implicitly discover ways to embed the 2D geographical relationship, spatial coherence and 3D geometric recognition in a completely discriminating fashion [11].
3 Methodology As previously stated, the problem of motion capture [12] is split into two subproblems: Estimating pose of a human [13] figure in a given frame and 3D reproducing the figure to closely resemble the ground truth. We will focus on these two subproblems in detail in this section. Keep your text and graphic files separate until after the text has been formatted and styled. Do not use hard tabs and limit the use of hard returns to only one return at the end of a paragraph. Do not add any kind of pagination anywhere in the paper. Do not number text heads-the template will do that for you.
3.1 Pose Estimation Here proposed a series of steps for pose estimation, to obtain a result with least error by using two convolutional neural networks in parallel to predict [14, 15] the pose of a single frame and pass the results to a restrictor, which focuses on predicting the pose in the given frame based on poses in two previous frames. Thus, making the system less prone to errors and making it more reliable. a. Dataset Creation As observed from related works, the dataset creation is the most challenging task within the process and have spent multiple months in manually creating the dataset by manually drawing surfaces for each joint to gain 50 K dataset elements [16]. Due to time constraints and fewer labors, we take into consideration the already existing pose estimating library “TFposeestimation” by TensorFlow for the purpose of dataset creations, multiple video frames are fed to the library and the output of TFposeestimation has treated as ground truth for further training of our own system. This method allows us to create thousands of dataset elements quicker and with lesser manual labor.
138
N. D. Navghare and L. M. Gladence
The obvious drawback of using this method is the results from TFposeestimation are prone to have an error, and these errors will also be reflected in our system after it is trained. But since this method has little of errors from a visual perspective and the pros dominate the cons, for the creation of dataset we propose to use the output of an already existing pose estimation library with the least error. b. Convolutional Neural Networks Since the system works on two distinct Convolutional Neural Networks, each of the Neural Networks will be trained on distinct training data and will indeed predict with the same probability in an ideal case. Since each CNN is trained on different datasets, the weights for each filter and NN layers will also be slightly different yielding different outputs. From the initial training on a simple problem, the dataset size is found to be at least 50K to have accuracy up to two decimals [16] (Figs. 1 and 2). Here, in this case, each CNN will have layers for each joint, with a summation at end of all filtering and down-converting normalized to have the inputs to neural networks compatible for lightweight hardware. The process of this convolution and prediction will yield x, y coordinates of each joint in a given frame, given by the formula:
Fig. 1 A high-level diagram of CNN
Fig. 2 A high-level diagram of layers in the neural network
End to End Learning Human Pose Detection … 21
O(x, y) =
0
j,k
e j,k
0
10 ∗ h[ j, k] f [m − j, n − k] ∗ ∅(n) 0 j k
139
(1)
where, he denotes weight at ith layer to jth layer multiplied by normalized summation of all convolutional layers, whole multiplied by an activation function, which in this case results in a sigmoid function. The learning rate in this training process is set to 0.01, with 50K inputs to train as mentioned: 21
O(x, y) =
0 21 0
j,k
e j,k
0
O(x, y) =
j,k
e j,k
0
10 ∗ ∅(n) ∗ h[ j, k] f − j, n − k] [m 0 j k 10 ∗ ∅(n) ∗ h[ j, k] f − j, n − k] [m 0 j k
(2)
(3)
c. Restrictor A restrictor in simple words is a predictor neural network, trained to predict the pose of the current frame based on poses provided in the previous two frames [17]. The dataset used to train the two CNNs are used here combined to have more precise predictions and can compare the best among the two results provided by the CNNs from the previous step. This restrictor is used for more precise pose estimation and higher accuracy. The same configuration of a neural network is taken as mentioned in the previous step. The significance of using two frames’ pose is to let the restrictor figure the angular speed of the individual joints and estimate accordingly the pose in the current frame. This method works extremely well in related works and can be seen used in other popular libraries such as TFposeestimation. With an output of the best of the two CNNs and with a slight modification to suit the predictions are sent to further processing as the output is considered as the final 2D estimation of the pose from the frame fed. Let [e1] and [e2] be the pose estimations inputs by two CNNs for the same frame input and let [p] be predicted pose by restrictor for this particular frame. Thus, the resultant pose passed by restrictor will be calculated using: [f] =
min([e1] − [ p], [e2 − p]) + [ p] 3
(4)
140
N. D. Navghare and L. M. Gladence
Fig. 3 High-level diagram for 3D reconstruction
3.2 3D Reconstruction A human model skeleton is preprocessed and is tried to match with the pose fed by restrictor using a supervised learning model. This reconstruction will enable users to view the pose in 3D and enabling them to apply the 3D pose to further 3D applications such as VFX and CGI [17]. Here a 2D–3D component tries to map the 3D human skeleton to try to closely resemble the 2D pose and is checked with a 2D projection of the same with the 2D pose, errors are backpropagated to the model and are mapped again till the 2D projection matches the estimated pose [18] (Fig. 3). Since the pose estimated by end of restrictor will be in terms of perspective, x’, y’ for a joint will be x/z, and y/z in 3D, thus, z can be calculated and given to joint and constructed as a 3D skeleton [18].
4 Result The proposed method evaluated on two benchmarks for single person segmentation and pose estimation: (1) the self-created human multiperson dataset and (2) the proposed methodology efficiency. This dataset collects images in diverse scenarios that contain many real-world challenges such as crowding, scale variation, occlusion, and contact. Our approach sets a feasible solution along with predictive methods, Single Image, DensePose and so on (Fig. 4).
End to End Learning Human Pose Detection …
141
Fig. 4 A plot of one image from the dataset for pose estimation with TFposeestimation and its mapping along the X-axis and along Y-axis
BodyPart : 0 − (0.77, 0.11) score = 0.93 Above is an example of a template representing the output from the estimation process. The 18 joints are labeled from body index zero, with the position of an estimated point in the entire image with mapping width to 1 and height to 1. Along with this, we also obtain an estimation score. With the estimation of the models with confidence rates by TFpose, the high confidence estimation score [18] is chosen to make the dataset consisting highest accurately predicted poses [19, 20].
5 Conclusion The above-specified model works efficiently with respect to both time and space complexity with basic functionalities required for the pose estimation. The RAM required for the training process is 1.6 GB included with dataset creation and taking a round of 10 s per entry in the CSV file. The calculations mentioned here, the project could take a few weeks in order to train the provided dataset which could yield nearly accurate results with no special computer upgrades. The only limitation here is if the
142
N. D. Navghare and L. M. Gladence
computer holds minimum memory. This problem will be solved as the input nodes will be removed by clustering the datasets and eliminating unwanted clusters, which will make system faster with less memory usage.
References 1. Z. Zhang et al., Weakly supervised adversarial learning for 3D human pose estimation from point clouds. IEEE Trans. Visual Comput. Graph. 26(5), 1851–1859 (2020) 2. F. Bogo et al., Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. in European Conference on Computer Vision, (Springer, Cham, 2016) 3. T.L. Munea et al., The progress of human pose estimation: a survey and taxonomy of models applied in 2D human pose estimation. IEEE Access 8, 133330–133348 (2020) 4. M. Xing, Z. Feng, Y. Su, J. Zhang, An image cues coding approach for 3D human pose estimation. ACM Trans. Multimedia Comput. Commun. Appl. 15(4), Article 113 (January 2020), 20 pages. https://doi.org/10.1145/3368066 5. N. Navghare et al., Python and OpenCV in automation of live Surveillance. in Machine Learning and Information Processing, (Springer, Singapore, 2020), pp. 235–243 6. H. Liu et al., Infrared head pose estimation with multi-scales feature fusion on the IRHP database for human attention recognition. Neurocomputing 411, 510–520 (2020) 7. Z. Wang, G. Liu, G. Tian, A parameter efficient human pose estimation method based on densely connected convolutional module. IEEE Access 6, 58056–58063 (2018) 8. X. Nie et al., Hierarchical contextual refinement networks for human pose estimation. IEEE Trans. Image Process. 28(2), 924–936 (2018) 9. G. Ning, Z. Zhang, Z. He, Knowledge-guided deep fractal neural networks for human pose estimation. IEEE Trans. Multimedia 20(5), 1246–1259 (2017) 10. X. Zhou et al., Deep kinematic pose regression. in European Conference on Computer Vision, (Springer, Cham, 2016) 11. K. Wang et al., 3D human pose machines with self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 42(5), 1069–1082 (2019) 12. Z. Cao et al., Realtime multi-person 2D pose estimation using part affinity fields. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017) 13. C. Held et al., Intelligent video surveillance. Computer 45(3), 83–84 (2012) 14. L.M. Gladence, T. Ravi, Heart disease prediction and treatment suggestion. Res. J. Pharm. Biol. Chem. Sci. 7(2), 1274–1279, (2016). ISSN: 0975-8585 15. N.D. Navghare, D.B. Kulkarni, Data privacy and prediction using neural network and homomorphic encryption. in 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), (IEEE, 2018) 16. S. Park, J. Hwang, N. Kwak, 3D human pose estimation using convolutional neural networks with 2D pose information. in European Conference on Computer Vision, (Springer, Cham, 2016) 17. R.A. Güler, N. Neverova, I. Kokkinos, Densepose: dense human pose estimation in the wild. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2018) 18. N.D. Navghare et al., Design of pipeline framework for pair trading algorithm. in 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), (IEEE, 2020) 19. J. Martinez et al., A simple yet effective baseline for 3D human pose estimation. in Proceedings of the IEEE International Conference on Computer Vision, (2017) 20. D. Mehta et al., Vnect: real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 1–14 (2017)
Conference Paper Acceptance Prediction: Using Machine Learning Deepali J. Joshi, Ajinkya Kulkarni, Riya Pande, Ishwari Kulkarni, Siddharth Patil, and Nikhil Saini
Abstract The paper presents a model that will predict the acceptance of the paper for a particular conference. The model is designed for the conferences, which accept researches done researches in the Machine Learning domain. The dataset that is used to develop this model is the ICLR 2017 (International Conference Of Learning Representation). The model gives its prediction based on extracted features. The features that most of the conferences consider are Number of References, Number of Figures, Number of Tables, Bag of words for ML related terms, etc. Some more features are taken into consideration to give better results such as Length of Title, Frequency of ML related words, Number of ML algorithms, Average length of sentences. The model is trained on the above-mentioned dataset, which contains 70 accepted and 100 rejected papers. For the prediction, different Machine Learning algorithms are used. The model is trained by applying algorithms such as Logistic Regression, Decision Tree, Random Forest, KNN, and SVM. The comparative study of different algorithms on the dataset gives the result that Decision Tress works effectively by providing 85% accuracy.
D. J. Joshi · A. Kulkarni (B) · R. Pande · I. Kulkarni · S. Patil · N. Saini Information Technology Department, Vishwakarma Institute of Technology, Bibvewadi, Pune 411037, India e-mail: [email protected] D. J. Joshi e-mail: [email protected] R. Pande e-mail: [email protected] I. Kulkarni e-mail: [email protected] S. Patil e-mail: [email protected] N. Saini e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_14
143
144
D. J. Joshi et al.
Keywords Logistic regression · Decision tree · Random forest · KNN · SVM · NLP · Machine learning
1 Introduction In the past few years, lots of research work have been done in the scientific field. ML has proven its importance in every field right from heath care to architecture. This new technology has allowed researchers to carry on their research work in a different direction. One of the prominent features of Machine Learning is the ability of a machine to improve its performance based on the previous results. That is what makes research in Machine Learning a never-ending process. This research needs some acceptance or acknowledgment. These researches were presented at conferences. A conference paper is often both a written document and an oral representation. The paper presented to the conferences should follow the conventions of an academic paper and oral presentation. As the research work increases, presenting research papers to conferences has gain prominence. Conference papers are a constructive way to provide innovative ideas and projects. By presenting work at a conference, one can get valuable and useful feedback from scholars which will increase their professional status at work. Every year many papers are submitted to the conference in various fields. But it is not the case that every paper submitted will get accepted. There are certain rules, criteria for the selection of paper for the conferences. These criteria vary from conferences to conferences. The presented paper is judged based on the reviews of the expertise. But the scientific community finds some problems with this method. There were issues like bias nature and many more which can bring differences to the work. One way to avoid this problem is to generate an automatic system that will evaluate the papers. This paper finds its way to solve this problem by predicting the acceptance of a conference paper by considering different criteria and features of the conference. The presented model takes the paper from a known dataset. The papers are in PDF file format. To parse these PDF files, modern NLP techniques are used. After parsing, the text is taken further for the extraction process. The process of extraction begins with tokenization and by removing stop words. One of the most important tasks is to extract features from the file as the prediction mechanism depends on these extracted features. All the features that are of importance for the particular conference are extracted using various NLP techniques. After extraction, different ML algorithms are applied to predict the result. The Machine Learning algorithms that were used are Logistic Regression, Decision Tree, Random Forest, KNN, and SVM. The results of these algorithms are studied comparatively.
Conference Paper Acceptance Prediction: Using Machine Learning
145
2 Literature Review In the last few decades, there has been considerable research work done in AI/ML domain. This led to the rapid growth of paper submissions to the conferences. As the submission rises, much work has been done for developing a model that will predict acceptance of the conference paper. The work by Aditya Pratap Singh, Kumar Shubhankar, and Vikram Pudi is one of them which ranks the research paper based on a citation network. Ying Ding (2009) works on academic research using a page rank algorithm where the author cocitation network is used to rank the authors. As the competition for the submission has increased ranking only based on the citation is not efficient. That’s where Jen et al. [1] came with a model that will predict acceptance using extracted features like Title Length, Number of Reference Year, Number of the most recent Reference year, etc. They further apply Machine Learning algorithms. Kang (2018) publish their work on this topic by releasing PeerRead, a structured dataset that collects several research papers from several ML/AI conferences, such as NIPS, ICML, ICLR, etc. Zaman, Mahdin, Hussain, and Atta-ur-Rahman (2019) provide an effective review for the extraction of information from structured and semi-structured data sources. Their work states different extraction techniques such as using NLP, Rule-based extraction, Chronological labeling based approach, extracting data from the table, etc. Also, Kita and Rekimoto (2019) in their paper predicted the importance of figures by using Machine learning techniques.
3 Dataset and Features 3.1 Dataset 170 papers submitted to ICLR 2017, including 70 accepted and 100 rejected papers are selected as the dataset. For each paper, 13 features are extracted. Work is focused on the ICLR dataset, which has limited examples. Further by having a study of different types of papers and performing some changes in Information Extraction methods, this model is applied to various conference papers dataset like NIPS (Neural Information Processing Systems) conference 2015 and CVPR (IEEE Conference on Computer Vision and Pattern Recognition) 2019. Also for these different conferences, our model performance is good.
3.2 Features In the computer science research paper, the reference is the information that is necessary to the reader in identifying and finding the used sources. A typical conference paper in computer science refers to approximately 20–30 other articles published in
146
D. J. Joshi et al.
peer-reviewed Conferences [2, 3]. Every conference paper has a lot to say through the data that have been collected and analyzed. However, these data need to be represented in a logical and easy to understand manner. This representation of the collected data table is required in the conference paper [4, 5]. To summarize some results, some content that cannot be done by text Images and figures are necessary [6]. While analyzing any text, topic-related words are most important. So to review any Machine Learning paper, machine learning related keywords and algorithms are important. For any research paper, many features have to take into consideration to give a review on it; hence following 13 features are selected. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Title Length. Number of References. Most Recent Reference Year. The number of Recent References. Number of Figures. Number of Tables. Total Machine Learning related words. The number of Machine Learning related words. Total Machine Learning Algorithms. The number of Machine Learning Algorithms. Frequently used words. Number of Sections. Average Sentence Length.
The selected features fall mostly under the mathematical or measuring side, but receive proper acknowledgment for the content of the paper, this set of features comes at the top. As there is a limit to title length and it becomes easy to search titles with the same name we considered title length as one of the features. Similarly considering conventions of the conference papers, the number of figures and tables is also given a place in features. It is important to consider recent papers as it may define how much weight is given to recent researches, and how much the paper might be influenced by recent papers in that field. This might not always be relevant as recent might be different in different fields. But the valid drawback of considering this is, it even might be just a smart copy of any one of the recent papers. The number of sections is also a feature considered from conference conventions. The most important ones in this feature list would be, Machine learning algorithms and several frequently used words. These features give little insight into what the paper is about. The usage and the manipulation of all these features are explained in the later sections.
4 System Diagram See Fig. 1.
Conference Paper Acceptance Prediction: Using Machine Learning
147
Fig. 1 System diagram
5 Working 5.1 Information Extraction Text extraction is a crucial stage for analyzing conference papers. Conference papers are generally in PDF format which is semistructured data. Conference papers are divided into different sections like Introduction, Methodology, Experimental setup, Result, and analysis, etc. The main importance of section extraction is to find a representative subset of the data, which contains the information of the entire set. Information Extraction (IE) is the process of extracting useful data from the already existing data by employing the statistical techniques of Natural Language Processing (NLP) [7].
148
5.1.1
D. J. Joshi et al.
Extracting ML Words and ML Algorithms
The first step is to break down PDF into component pieces or “tokens.” A token is a string of contiguous characters between two spaces, or between a space and punctuation marks. To extract ML words and algorithms, a list of ML words and algorithms is stored in a CSV file and tokens are matched with each word in CSV file. If the match is found, we store them into a list. The next step is extracting Bi-grams and Tri-grams. A bigram is two consecutive words in a sentence. The trigram is three consecutive words in a sentence. To extract them, we use noun chunks. Noun chunks are flat phrases that have a noun as their head.
5.1.2
The Average Length of Words in a Sentence:
We are calculating the average number of words in every sentence. First, we divide the text into sentences using sentence tokenizer. Sentence tokenization is the process of splitting text into individual sentences. After that, we calculate words in every sentence using word tokenizer. Word tokenization is the process of splitting, a large sample of text into words. Finally, we calculate the average number of words in the sentence.
5.1.3
The Average Length of Sentence
First, we divide the text into sentences using sentence tokenizer. After that, we calculate the characters in every sentence. Finally, we calculate the average number of words in the sentence.
5.1.4
Number of Figures, Tables
To extract the number of figures, we use regular expression in python. A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. We find all tokens using a regular expression such as Figure 1, Figure 2, etc. and count them.
5.1.5
Extracting Sections
To extract sections, we find all uppercase and bold words, bigrams, trigrams tokens in a pdf file. That tokens are matched with the list of all possible section names.
Conference Paper Acceptance Prediction: Using Machine Learning
5.1.6
149
Extracting Most Frequently Used Words
We use the split() function which returns a list of all the words in the pdf. Pass this list to an instance of Counter class. Function most_common() produces k number of most occur words.
5.1.7
Extracting References
Regular expression is used to extract references. We describe a search pattern that can be used to find patterns inside a string that extracts references in a pdf file. The findall function returns a list that contains all the matched references.
5.2 Data Preprocessing Out of 13 features, 10 are integers and 3 are a list of words. So to use a list of words we have used sequencing and clustering. With sequencing, we converted the list of a word into a number sequence. After that those sequences are classified in a suitable number of clusters. A suitable number of clusters are found using Elbow Method.
5.2.1
Sequencing
Sequencing means convert word sequence into number sequence. To do this we have used Tensorflow and Keras tools [9]. Consider following example of “frequently used ml words”: List 1 : ‘vectors’, ‘function’, ‘parameters’, ‘distance’, ‘matrix’ = [2, 3, 4, 5, 6] List 2 : ‘prediction’, ‘data’, ‘vectors’, ‘semantic’, ‘matrix’ = [7, 8, 2, 9, 6] In this way, the word list is represented in the integer list.
5.2.2
K-means Clustering and ElbowMethod
In this, suitable numbers of clusters are formed of integer sequences and those cluster’s number is used in training of the model. List 1 : ‘vectors’, ‘function’, ‘parameters’, ‘distance’, ‘matrix’
150
D. J. Joshi et al.
= [2, 3, 4, 5, 6] = Cluster Number 6 List 2 : ‘prediction’, ‘data’, ‘vectors’, ‘semantic’, ‘matrix’ = [7, 8, 2, 9, 6] = Cluster Number 6 In this way, the integer list is represented into a single integer that can be used for training the model. To find a suitable number of clusters, the elbow method is used. For the K-means clustering algorithm, there are two mainly used terms which are Distortion and Inertia. 1. Distortion: It is calculated as the average of the squared distances from the cluster centers of the respective clusters. Typically, the Euclidean distance metric is used [8]. 2. Inertia: It is the sum of squared distances of samples to their closest cluster center [8]. We iterate the values of k from 1 to 20 and calculate the values of distortion and inertia for each value of k in the given range. To determine the suitable number of clusters, we have to select the value of k at the “elbow” that is the point after which the distortion/inertia start decreasing linearly. In Fig. 2, the elbow can be seen at k = 16. This figure is for Machine Learning algorithms word sequences. Similarly for frequently used words and Machine Learning words, we got k = 8 and k = 15. Fig. 2 Elbow formation for machine learning algorithms words
Conference Paper Acceptance Prediction: Using Machine Learning
151
Table 1 Test Results Sl. No
Algorithm
Test accuracy (%)
Precision
Recall
F1-Score
1
Logistic regression
64.70
0.73
0.47
0.57
2
Decision tree
85.29
0.93
0.82
0.87
3
Random forest
72.52
0.79
0.65
0.71
4
K-nearest neighbors
64.70
0.67
0.59
0.62
5
Support vector machine
50.00
0.50
1.00
0.67
6 Implementation 6.1 Model Training and Testing Target is to predict that paper will get accepted or not. This is a classification problem. For this classification problem, five different classification models are used. There is a total of 170 papers dataset. Those papers are split into 80–20% that is 136 papers for training and 34 for testing. We have used the SK-Learn python tool for implementing all models [10]. Testing accuracy for random split is as follow:
6.2 Result Analyses The result is shown in Table 1 are the models with the best hyperparameter and best result in fivefold cross-validation for each model category. As shown in Table 1, Decision Tree Classifier gives the highest accuracy. With fivefold cross-validation for models, Decision Tree gives 67% average accuracy and 5% standard deviation. It is expected that the neural network to perform better. One reason for this is that the dataset is relatively small with only 170 samples. Typically, neural networks require at least an order of magnitude larger dataset for good accuracy. So the accuracy of CNN is very less so it is not mentioned. As the dataset has fewer dependent variables and dataset has some features, which have helped to decide so the decision tree performs better. As shown in the above table, if the model predicts that paper is accepted for that condition Precision is 0.93, Recall is 0.82, and F1-score is 0.87 for Decision Tree.
7 Conclusions and Future Scope In this paper, we proposed a new system by combining Machine Learning’s clustering algorithms and Natural Language Processing to predict whether a paper will get accepted or not for the conference. We have used Logistic Regression, Decision
152
D. J. Joshi et al.
Tree, Random Forest, KNN and SVM and NLP’s feature extraction taking to extract Number of References, Number of Figures and Tables, etc. After applying all the above-mentioned algorithms and making a comparative study, we concluded that the decision tree provides the best-expected result. This method developed in this paper is more reliable to avoid biases and to make the decision based on the content of the paper. We are pretty sure that this model will provide an efficient result by predicting the acceptance of a paper. While developing the model, papers for a particular conference, i.e., ICLR 2017 are taken into consideration. A model can be made, which will perform exceptionally well for all other conferences. Also, the idea can be expanded by allowing papers from multiple domains to get evaluated. We can add another exciting feature such as the number of authors, grammar mistakes, plagiarism, etc. But this will require additional advanced NLP and Machine Learning Techniques. Another interesting work can be a feature that will provide us information that whether the authors have published any paper in conferences earlier. It will become easy if the author has provided his/her LinkedIn profile. The efficiency of the model can be increased by extracting information about citations mentioned in the papers.
References 1. W. Jen, S. Zhang, and M. Chen, Predicting Conference Paper Acceptance. (2018), p. 7 2. A. Santini, The importance of referencing. J Crit. Care Med. 4(1), 3–4 (2018). https://doi.org/ 10.1515/jccm-2018-0002 3. I. Masic, The importance of proper citation of references in biomedical articles. Acta Informatica Med. AIM: J. Soc. Med. Inf. Bosnia Herzegovina 21, 148–155 (2013). https://doi.org/ 10.5455/aim.2013.21.148-155 4. R.P. Duquia et al., Presenting data in tables and charts. Anais Bras. Dermatologia 89(2), 280– 285 (2014). https://doi.org/10.1590/abd1806-4841.20143388 5. S.B. Bavdekar, Using tables and graphs for reporting data. J. Assoc. Phys. India 63(10), 59–63 (2015) 6. Y. Kita, J. Rekimoto (2017). Prediction of the importance of figures in scholarly papers. pp. 46– 53, https://doi.org/10.1109/ICDIM.2017.8244648 7. K. Jayaram, K. Sangeeta, A review: information extraction techniques from research papers. in 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA) (Bangalore, 2017), pp. 56–59, https://doi.org/10.1109/ICIMIA.2017.7975532. 8. A. Gupta, Elbow method for the optimal value of k in K-means. Geeksforgeeks 9. https://www.tensorflow.org/api 10. https://scikit-learn.org/stable/
Object Identification and Tracking Using YOLO Model: A CNN-Based Approach Shweta Singh, Ajay Suri, J. N. Singh, Muskan Singh, Nikita, and Dileep Kumar Yadav
Abstract The object identification, detection and tracking them in the individual video frames are an expensive and highly recommended task for security and surveillance. This work expects to consolidate the procedure for object recognition with the objective of accomplishing high precision with a real-time performance. A significant test in huge numbers of the object detection frameworks is the reliance on other computer vision systems for helping the profound learning-based methodology, which prompts moderate and nonideal execution. A deep learning-based approach will be used to solve the problem of object identification in an end-to-end fashion. The framework is set up on the most testing straightforwardly open dataset (PASCAL VOC), on which an article revelation challenge is driven each year. The subsequent framework is quick and exact, along these lines helping those applications which require object detection. This work also demonstrates an appropriate study for well popular methods. Keywords Object detection · Tracking · Identification · YOLO · Bounding box · Convolution neural network
S. Singh (B) · J. N. Singh · Nikita · D. K. Yadav Galgotias University, Greater Noida, India e-mail: [email protected] J. N. Singh e-mail: [email protected] D. K. Yadav e-mail: [email protected] A. Suri ABES Engineering College, Ghaziabad, India e-mail: [email protected] M. Singh Galgotias College of Engineering and Technology, Greater Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_15
153
154
S. Singh et al.
1 Introduction This section portrays the ideas utilized in the undertaking Object Detection utilizing Machine Learning intricately. The venture goes under the domain Machine learning, which is part of Artificial Neural Network. Machine learning ideas cause the framework to take in all alone from the encounters it gains, without the obstruction of the outside variables. The YOLO (You Only Look Once) calculation utilizing Convolution Neural Network is utilized for the detection purpose. It is a Deep Neural Network idea from Artificial Neural Network. Artificial Neural Network is roused by the biological concept of Nervous System where the neurons are the hubs that structure the system. Thus, in Artificial Neural Network, perceptions act as the hubs in the system. Artificial Neural Network has three layers that are, Input Layer, Hidden Layer and the yield Layer. Profound Learning is the piece of the Artificial Neural Network that has different Hidden Layer that can be utilized for the Feature Extraction and Classification purposes [1]. Convolution Neural Network (CNN) is the piece of Deep Learning that is utilized in the examination of visual imagery. It has four various types of layers, they are, Convolution Layer, Pooling Layer, Activation Layer and Fully Connected Layer. Convolution Layer utilizes channel and walks to get the Feature Maps. These Feature Maps are the framework that is gotten after the Convolution Layer. It tends to be improved utilizing ReLU (Rectified Linear Unit) that maps negative qualities to 0. The came about Feature Map is diminished by sending it into the Pooling Layer where it is decreased to the littler estimated lattice.
2 Problem Statement The project “Object Detection System using Machine Learning Technique” distinguishes objects effectively dependent on YOLO calculation and applying the calculation on image information and video information to detect objects. Numerous issues in computer vision were immersing on their precision before 10 years. Be that as it may, with the ascent of deep learning techniques, the exactness of these issues radically improved. One of the major problems was that of image identification, which is known for predicting the class of the image.
3 Architecture of the Proposed Model Figure 1 shows the architecture diagram of the proposed YOLO model. Images are given as the input to the system. If video can also be taken as input as it is nothing but a stream of images. As the name recommends, only Look Once, the information
Object Identification and Tracking Using YOLO Model …
155
Fig. 1 YOLO architecture
experiences the input goes through the network only once and the consequence of distinguished object with Bounding Boxes and Labels are obtained [1, 2]. The images are classified into SXS grid cells before moving them to the Convolution Neural Network (CNN). B Bounding boxes per grid are created around all the distinguished articles in the picture as the consequence of the Convolution Neural Network. Then again, the Classes to which the articles have a place are additionally ordered by the Convolution Neural Network, giving C Classes per grid. At that point, a limit is set to the Object Detection. Right now have given a Threshold of 0.3. Lesser the Threshold esteem, progressively number of bouncing boxes will show up in the yield bringing about the ungainly yield. When the info is chosen, Preprocessing is done, where the SXS grids are framed [1, 3, 4]. The resultant in this manner framed with the matrices is sent to the Bounding Box Prediction process where the Bounding Boxes are drawn around the distinguished items. Next the outcome from the past procedure is sent to the Class Prediction where the Class of the item to which it has a place is anticipated. At that point, it is sent to the identification procedure where a Threshold is set so as to lessen awkwardness in the yield with many Bounding Boxes and Labels in the last Output. Toward the end, an image or a surge of images are created for image and video or camera input individually with Bounding Boxes and Labels are obtained as the Output [2] (Fig. 2).
156
S. Singh et al.
Fig. 2 Data flow diagram of the system [1]
4 Implementation This part contains the methodology for implementing this project. The algorithm for detecting the object in the Object Detection System is given as:
4.1 Algorithm for Object Detection System 1. The given input image is divided into SXS grid. 2. For every cell it predicts B bounding boxes. Each bounding box contains five components: (x, y, w, h) and a crate certainty score. 3. YOLO identifies one article for every grid cell just paying little mind to the number bounding boxes. 4. It estimates C conditional class probabilities. 5. In the event that no items exists, at that point certainty score is zero. Else certainty score ought to be more prominent or equivalent to edge esteem. 6. YOLO then finds bounding box along with the detected objects and predicts the class to which the object belongs.
5 Results and Analysis This part depicts the outcomes acquired by the System, diverse Test Cases utilized while testing the System. We utilized pretrained dataset of COCO, which had 80 classes. The reason behind 80 classes is that because a greater number of classes resulted in the incompleteness of the data. Following area will depict the distinctive Test Cases and the outcomes acquired [2].
Object Identification and Tracking Using YOLO Model …
157
Table 1 Test cases with results Test case ID
Test conditions
Expected result
Test results
TC1
When image is considered as input
Image with bounding box along with the objects and predicted class
SUCCESSFUL
TC2
When video is considered as input
Video with bounding box along with the objects and predicted class
SUCCESSFUL
TC3
When camera is considered as input
Objects identified in the real time with bounding box, confidence score and predicted class
SUCCESSFUL
TC4
When black and white image is considered as input
Image with bounding box along with the objects and predicted class
SUCCESSFUL
TC5
Image with image Image with detected objects objects is considered as input
UNSUCCESSFUL
TC6
When image with overlapping objects is considered as input
Image with bounding box around the objects and predicted class
SUCCESSFUL
TC7
When image with distant objects is considered as input
Image with detected objects
UNSUCCESSFUL
5.1 Test Cases Table 1 indicates the different Test Cases, the Expected as well as the Test Result.
5.2 Results This part shows different results that were obtained by giving various Test Cases described above. Figure 3 outlines the yield of the Object Detection System. Jumping Boxes are drawn around the Objects distinguished. Figure 4 delineates the yield got when items are covering. This shows halfway noticeable articles will likewise be identified by drawing bouncing box around it alongside the name demonstrating the class to which it has a place. In Fig. 4, a few people are in part obvious in the picture of a jam-packed study hall. The framework can recognize each individual noticeable in the picture. The output created when Video is given as the information is appeared in Fig. 5. The video that will be given as contribution to the framework ought to be in.avi position. Figure 6 represents the yield when camera is utilized to recognize the article. Figure 7 shows the yield created when a haze picture is given as the info.
158
Fig. 3 Image with detected object [2]
Fig. 4 Image with overlapping objects [2]
S. Singh et al.
Object Identification and Tracking Using YOLO Model …
159
Fig. 5 Output obtained with video input [2]
Fig. 6 Tracked object using unified method [1]
Arbitrary jumping boxes are drawn with no distinguished item. This is one of the drawbacks of the project which gives ineffective test outcome [2].
6 Conclusions and Future Work The project is created with target of recognizing continuous objects in image, video and camera. The bounding boxes are drawn around the detected articles alongside
160
S. Singh et al.
the mark showing the class to which the item has a place. This paper work depicts the utilization of CPU for the handling in the undertaking. Future improvements can be centered by executing the venture around the framework having GPU for quicker outcomes and better precision.
References 1. P. Kalshetti, A. Jaiswal, N. Rastogi, P. Gangawane, Object detection (Department of Computer Science and Engineering, Indian Institute of Technology Bombay, India) 2. P. Amin, B.S. Anushree, B. Shetty, K. Kavya, L. Shetty, Object detection using machine learning technique. Int. Res. J. Eng. Technol. (IRJET) 7948–7951 (May 2019) 3. M. Buric, M. Pobar, I. Kos, Object Detection in Sports Videos. in 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (IEEE, Croatia, Opatija, 2018), pp. 1034–1039 4. Y.L. Lin, Y.M. Chiang, H.C. Hsu, Capacitor detection in PCB using YOLO algorithm. in International Conference on System Science and Engineering (ICSSE) (IEEE, New Taipei City, Taiwan, December 2017), pp. 1–4 5. J. Lee, J. Bang, S. Yang, Object detection with sliding window in images including multiple similar object. in International Conference on Information and Communication Technology Convergence (ICTC) (IEEE, December 2017), pp. 803–806 6. H. Xie, Q. Wu, B. Chen, Vehicle detection in open parks using a convolutional neural network. in 6th International Conference on Intelligent Systems Design and Engineering Applications (ISDEA) (IEEE, Guiyang, China, August 2015), pp. 927–930 7. M. Shah, R. Kapdi, Object detection using deep learning networks. in International Conference on Intelligent Computing and Control Systems (ICICCS) (IEEE, Madurai, India, June 2017), pp. 787–790 8. S.M. Abbas, S.N. Singh, Region based object detection and classification using faster R-CNN. in 4th International Conference on Computational Intelligence and Communication Technology (CICT) (IEEE, Ghaziabad, India, October 2018), pp. 1–6 9. P. Shukla, B. Rautela, A. Mittal, A computer vision framework for automatic description of Indian monuments. in 13th International Conference on Signal Image Technology and Internet Based Systems (SITIS) (IEEE, Jaipur, India, December 2017), pp. 116–122
Real-Time Hands-Free Mouse Control for Disabled Premanand Ghadekar, Pragya Korpal, Pooja Chendake, Raksha Bansal, Apurva Pawar, and Siddhi Bhor
Abstract In this paper, a human–computer interface system using eye motion is implemented. In traditional methods, human–computer interfaces use keyboard, mouse as input devices. A hand-free interface between computer and human is represented in this paper. The system is developed using template matching and is a real time, fast and affordable technique for tracking facial features and eye gestures. The traditional computer screen pointing devices can be replaced by this technology, for the use of disabled people. The paper presents computer mouse cursor movement with human eyes. Wherever the eyesight focuses, accordingly the mouse is controlled. The proposed vision-based virtual interface controls the system by various eye movements such as eye blinking, winking of eye. Keywords Eye tracking · Mouse movement · Eye-blinking detection
1 Introduction In recent times, many people are falling victim to diseases like Paraplegia, that impair them physically, as a result of which the person is unable to use his body from neck down. Their eyes are the sole organ that may generate different actions. A significant number of individuals suffering from Amyotrophic lateral Sclerosis or those who suffer from paralysis are not able to use computers for basic tasks. Some disabled people cannot move anything except their eyes, for such people movement of eyes and blinking of eyes are the sole means to use a computer and interact with the outside world. The aim of this research was to aid the physically challenged by developing a system that allows them to communicate with the system using their eyes. The technique that is presented, helps disabled people to be independent in their lives. The aim of eye gesture tracking is to capture an individual’s eye movements and use it as control signals, in order to facilitate communication with systems, without P. Ghadekar · P. Korpal (B) · P. Chendake · R. Bansal · A. Pawar · S. Bhor Department of Information Technology, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_16
161
162
P. Ghadekar et al.
requiring input from a mouse or keyboard. The purpose of this paper is to explore and improve upon present developments in the eye movement and gesture monitoring systems, principally in the domains which can benefit physically disabled individuals, enabling them to use computers and other programmable control systems. Thus, such individuals could still take on their duties, contribute to society, perform everyday tasks involving computers, mostly without the need for a helping hand, hence becoming independent.
2 Literature Review The motive of studying the literature was to understand the domain in which the movement of mouse pointer and eye detection is included. The main focus was to create a system to satisfy the wants of disabled people and therefore it could not be complicated. “Statistical models of appearance for eye tracking and eye blink detection and measurement” [1, 2]. AAM is an evidence of conception model to detect the framework that computes the level of eye blinks. Benefit of using this technique is that one gets the detailed information of an eye. The disadvantage of this is, it works only for one person. “Communication via eye blinks—Detection and duration analysis in real-time” [3]. It determines the eye blinks. “Blink link” framework could be used. The disadvantage is that it cannot handle shorter blinks and ignores it. “An eye tracking algorithm based on Hough transform” created in 2018 [4]. Digital camera is used to identify and determine the eyeball. The problem is it takes too much time to track the iris and does not work in real time. In 2017, a system was introduced by authors of, “An image-based eye controlled assistive system for paralytic patients” [5]. It was for paralyzed people. It uses digital camera. The drawback was lots of frameworks used and it takes long time to detect the pupil of an individual. In 2016 [6], a system was developed where the user’s eyes are detected using infrared camera. The problem is it takes long time to operate and is expensive. In 2015 [7], “Pupil center coordinate detection using the circular Hough transform technique” was developed. It uses a webcam to identify the pupil of an individual. The drawback is, it requires more time and does not work in real time, “Face and eye tracking for controlling computer functions” in 2014, [8]. For this camera is used. Drawback is it only works for short distances. In 2013 [9], “Eye tracking mouse for human–computer interaction” was introduced which depends upon pictogram selection. The problem is that it does not work, if anything is present near the eyeball, such as, if the females apply makeup and eyeliner.
Real-Time Hands-Free Mouse Control for Disabled
163
Fig. 1 Proposed system
3 Proposed System The proposed system uses facial expressions and movements to control the mouse cursor. It is real time, does not use any external hardware and hence eliminates the related discomfort involved in the existing systems. Instead of any wearable hardware, the proposed model extracts an individual’s eyes using a template matching technique. In this model, only the hard blink is used to choose desired file or folder. As shown in Fig. 1, the first step is to capture the face, in order to move the mouse cursor in the required manner. Then it behaves as a normal computer mouse.
4 Dataset Used We have used shape_predictor_68_face_landmarks dataset. There are points on our face, on the corners of the mouth, on the eyes, along the eyebrows, which are included in these 68 landmarks. The standard Histogram of Oriented Gradients (HOG) feature along with an image pyramid, a linear classifier and sliding window detection scheme is used to make this dataset.
5 Code and Requirements 5.1 Hardware This model requires only a camera and a laptop, hence requiring minimal hardware. Camera: In this system, a standard laptop webcam is used to obtain images which are of good picture quality, in proper lighting conditions.
164
P. Ghadekar et al.
5.2 Software The model has been implemented using PYTHON version 3.7. In the system following python packages are used: 1) NumPy 2) OpenCV 3) Dlib.
6 Algorithm Through user’s eye movements the iris is tracked and mouse moves from one destination to another destination on desktop. Before the cursor movement begins from one place to another, a complete process takes place, which is provided below: 1. Input from the eyes is received by the camera. 2. After receiving the inputs, it breaks them into frames. 3. After the frames are generated the lighting in the surrounding is checked out thoroughly, as the camera requires adequate light, else the results obtained are not accurate. 4. Frames that focus on the eye are then examined for detection of iris. 5. The next crucial step is to find the center of the iris. It detects the “eye window,” which estimates the rough area of the eye using Voila-Jones algorithm. 6. The exact position of the iris through the eye window is found by using Houghman Circle Detection Algorithm. 7. The iris is mapped from the scene camera. Predetermined calibration points are used so that iris position can be mapped to a location on the screen [10]. 8. Then midpoint is calculated by considering the mean of eye center point for left and right eye. 9. After all these steps, the mouse moves from one place to another.
7 Methodology As the initial step, a face detection technique is used to locate the face on a frame captured by the laptop webcam. The subsequent step is to draw a contour over the eyes by referring to the facial landmarks. Considering only one eye movement leads to quicker processing. After this the pupil movement is monitored, in which, white part of the eyes is detected. When the pupil moves to any one side, the white area in that side becomes almost zero. The ratio of white side area on the left and right of the pupil is calculated to detect the direction of pupil movement.
Real-Time Hands-Free Mouse Control for Disabled
165
Based on the pupil movement, cursor movement is carried out. A person’s eye focus is monitored and according to the change in focus there is change in cursor location.
8 Design and Implementation 1. Start: Whenever the user Starts, the module opens. Stop: Stop gets the user out of the system. When the user runs the module, the camera opens in an automatic manner. 2. When the camera starts, it records the video of the movement of the user. The video recorded is then broken down into many separate image frames. 3. Conversion into gray scale from RGB: The frames obtained from the video are in RGB format. For ease of processing, these frames are then brought into gray-scale format. 4. Eye Detection and Monitoring: Next, the frames are made into binary form through thresholding, and contain only two colors. This facilitates ease of detection of an individual’s eye. The eye, which is the region of interest, is the area present in white color [11, 12]. Figure 2 illustrates the system implementation. The desired actions take place after all the above steps are successfully executed.
Start
Recording Videos and converting it to frames
Converting RGB into grey scale
Eye Detection and Tracking
Stop Fig. 2 System implementation
Basic Operation
166
P. Ghadekar et al.
9 Working and Experimentation The program first extracts frames from the video captured using webcam. For ease of processing, these frames are converted from color to gray scale. Using frontal face detector of dlib library, the user’s face is detected. The total execution time is 0:00:53.28. In Fig. 3, Blinking of eyes is used to open any file. For blink detection, the eyes are the region of interest. Once we have obtained the user’s face region, “shape_predictor_68_face_landmarks.dat” file is taken as reference to detect the eyes. For each of the eyes, the horizontal and vertical distance of the eye region is computed. When a user blinks, the horizontal distance between the eye endpoints remains the same but the vertical distance decreases. Ratio of the horizontal distance to the vertical distance is calculated. If this ratio becomes greater than 5.7 for both the eyes, then it indicates that the user has blinked. Accuracy: Blinking—90%. In Fig. 4, Left eye wink and right eye wink are used to prompt left click and right click of mouse. For wink detection, eye region is obtained using face_utils of imutils library. The Eye Aspect Ratio (EAR) of both the eyes is calculated. If the EAR of left eye becomes less than an eye threshold then user has winked left eye, similarly Fig. 3 Blinking of eye
Fig. 4 Left click
Real-Time Hands-Free Mouse Control for Disabled
167
Fig. 5 Left eye movement
Fig. 6 Right eye movement
if EAR of right eye becomes less than the threshold, user has winked right eye. This eye threshold is taken to be 0.25. Accuracy: Click—left: 86% and right: 80%. In Figs. 5 and 6, Left and right eye movement is used to control left and right cursor movement, respectively. The pupil of the eye is detected using thresholding and white region area to the left and right of pupil is computed. The ratio of left side white area to right side white area is taken. When the user looks to left, the left side white region becomes almost zero and the right side white region increases, viceversa when the user looks to the right [13, 14]. When the gaze ratio becomes greater than 2, the left cursor movement takes place, and right cursor movement takes place when gaze ratio is less than 0.5. In this system, the actual left-side region is taken as right-side region and vice-versa because the frames are not mirrored. Accuracy: Cursor movement—left: 86%; right: 80%; up: 73%; down: 86%. Head movement is used to control up and down cursor movement in reading mode and for scrolling in scroll mode. Reading mode is enabled using opening of mouth as shown below in Fig. 7. Scroll mode is enabled using squinting of eyes shown in Fig. 8. Mouth region is detected using face_utils of imutils library. Mouth aspect ratio is calculated and when it is greater than 0.6 then mouth open is detected. Up and
168
P. Ghadekar et al.
Fig. 7 Reading mode enabled
Fig. 8 Scroll mode enabled
Fig. 9 Scroll up
down head movement is detected with respected to the user’s nose position shown in Figs. 9 and 10 [15]. Accuracy: Scroll—up: 73% and down: 86%.
10 Conclusion and Future Scope The system permits the disabled to manage a mouse pointer of a computing system through human eyes. The system that is designed and developed is cost-effective, as there is no need to wear any extra attachments, it uses at most a camera and software packages. It works in real-time and is user friendly.
Real-Time Hands-Free Mouse Control for Disabled
169
Fig. 10 Scroll down
Sometime later, we would like to add features for audio which might help the disabled to enter information orally. It can be used in various other applications. It will also help them to manage home devices. It can also be used in computer games. It has vast future scope in the area of medical science and advertising.
References 1. I.M. Bacivarov, P. Corcoran, Statistical models of appearance for eye tracking and eye-blink detection and measurement. IEEE Trans. (August 2010) 2. I. Bacivarov, M. Ionita, P. Corcoran, Statistical models of appearance for eye tracking and eye blink detection and measurement. IEEE Trans. Consum. Electron. 54(3), 1312–1320 (2009) 3. K. Grauman, M. Betke, J. Gips, G.R. Bradski, Communication via eye blinks—detection and duration analysis in real-time. (IEEE, 2009) 4. A. Bukhalov, V. Chafonova, An eye tracking algorithm based on Hough transform (2018), https://ieeexplore.ieee.org/document/8408915/ 12 May 2018 5. M. Alva, N. Castellino, R. Deshpande, An image-based eye controlled assistive system for paralytic patients (2017), https://ieeexplore.ieee.org/document/8066549/ 6. A. López, D. Fernández, F.J. Ferrero, EOG signal processing module for medical assistive systems (2016), https://ieeexplore.ieee.org/document/7533704/ 7. A. Pasarica, V. Cehan, C. Rotariu, Pupil center coordinates detection using the circular Hough transform technique (2015), https://ieeexplore.ieee.org/document/7248041/ 8. C. Kraichan, S. Pumrin, Face and eye tracking for controlling computer functions. https://iee explore.ieee.org/document/6839834/ (2014) 9. R.G. Lupu, F. Ungureanu, V. Siriteanu, Eye tracking mouse for human-computer interaction (2013), https://ieeexplore.ieee.org/document/6707244/ 10. S-J. Baek, Y.-H. Kim, Eyeball model-based Iris center localization for visible image based eye-gaze tracking systems, (IEEE, 2013) 11. A. Dave, A. Lekshmi An image-based eye controlled assistive system for paralytic patients (2017), https://ieeexplore.ieee.org/document/8066549/ 12. S.M.T. Saleem, Imouse: Eyes Gesture Control System (January 2018), https://www.researchg ate.net/publication/327986681/ 13. E. Sung, J.-G. Wang, Study on eye gaze estimation. IEEE, 32(3), (2002)
170
P. Ghadekar et al.
14. R.G. Lupu, F. Ungureanu, V. Siriteanu, Eye tracking mouse for human computer interaction. in Proceedings of the 4th IEEE International Conference on E-Health and Bioengineering (EHB ‘13), (Iasi, Romania, November 2013) 15. S.M.A. Meghna, K.L. Kachan, A. Baviskar, Head tracking virtual mouse system based on ad boost face detection algorithm. Int. J. Recent Innov. Trends Comput. Commun. 4(4), 921–923 (2016)
Accounting Fraud Detection Using K-Means Clustering Technique Giridhari Sahoo and Sony Snigdha Sahoo
Abstract Background: Accounting fraud has become quite prevalent in current times. This kind of frauds or scandals comprise any act of manipulation of financial statements as a means for hiding financial misdeeds. Such kind of fraudulent practices can be brought to the fore with data mining techniques. One can recognize patterns for fraud in an accounting data set with data mining. Methodology: One such data mining technique is K-means clustering. This divides the data set into k distinct non-overlapping partitioned clusters. Though a number of statistical models have been used at length for revealing patterns in such scams, they have been found to be complicated and highly time-consuming. In such scenario, K-means may form the simplest preliminary basis of fraud detection by segregating the data set into fraudulent and non-fraudulent sets which may be further followed by more complex practices for establishing the same. This work has been carried out on an accounting data set, and the factors considered are bank statements (credit and debit in the cashbook), asset value and net profit, on which K-means clustering has been simulated. Conclusion: This work is an effort toward showing a simple application of K-means. The experimental results indicate high accuracy and significant relation between misrepresented values of these factors and fraud. Keywords Accounting · Fraud detection · Data mining · K-means
1 Introduction Fraud is a deception or misrepresentation that an individual or entity makes with an ill intention of accruing some unauthorized benefits out of it. Accounting fraud has specifically emerged as one of the critical global problems [1]. It is a deliberate G. Sahoo (B) School of Commerce and Economics, KIIT Deemed To Be University, Bhubaneswar, India e-mail: [email protected] S. S. Sahoo Department of Computer Science and Applications, DDCE, Utkal University, Bhubaneswar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_17
171
172
G. Sahoo and S. S. Sahoo
manipulation of accounting records in a bid to showcase good financial health of any organization while the reality might be far from anything good. At times, companies play around with revenues and expenditures so as to project higher profits, leading to frauds. Cyber-attacks, bribery, corruption and so on also let fraud enter into accounting. But whatever the underlying reason be, such kind of frauds lead to persistent adverse effects across the business world, and they create a ripple effect negatively affecting the investors, the shareholders, the business partners and the organizational workforce, all alike. Further, it gets impossible to accurately decide upon the stability of such organizations. Thus, there is a huge responsibility on the auditors for detecting fraudulent financial reports [2, 3]. While credit transactional fraud detection has received a lot of attention from researchers, accounting frauds are yet to be dealt in-depth. The initial step in the direction is to study the characteristics of such kind of accounting fraud. There are a number of features based on which aforesaid accounting frauds may be revealed [3]. Debts, assets, net profits and distorted financial ratios are some of the features in an accounting record which can be closely observed as the first step for detecting any such anomaly. A variety of techniques, methods and tools have been used in the past for detecting frauds [4, 5]. Only recently, data mining and computational models [6] have been inducted for identifying patterns in money laundering crimes on the financial data set [5, 7, 8], although data mining tools for fraud detection are in use since long. Data mining (DM), also termed as knowledge mining, involves extraction of previously unidentified and incomprehensible information from large data sets. Several data mining techniques, like association rule mining, classification, clustering and prediction and so on [9, 10], have been widely used for pattern identification and establishing relationship among the data in a data set. This is the sole reason they have been put to use for efficiently bringing out scams into limelight in many sectors, like banking and telecom [11, 12]. They have also been used for detecting fraud in signalling data [11], for analysing fraudulent behaviour in online auctions [13] and so on. DM techniques have proved to be quite efficient in outlier detection [14]. Various data mining techniques such as logistic models [15], neural networks [16], Bayesian networks [17] and decision trees have been applied for classification [18] and detection of fraudulent accounting data [9]. Clustering technology [19] has also been used for automating fraud filtering during an audit for helping the auditors in evaluating and grouping together life insurance claims with similar characteristics. Also, K-means clustering along with self-organizing maps has been used on financial statements for the task of detection of fraud financial statements. [20]. As K-means has shown a great deal of efficacy, we too have proposed a simple model based on the basic K-means technique in this work for detecting fraud from financial statement of an organization. The rest of the paper is organized as follows. Section 2 talks about various types of clustering and gives an overview of the clustering technique used in this paper. Section 3 briefly outlines the proposed model. The data, implementation details and results are provided in Sect. 4 followed by a brief summarization of the future scope of this work in Sect. 5.
Accounting Fraud Detection Using K-Means Clustering Technique
173
2 K-Means Clustering Technique K-means is a partitioning-based clustering technique. It has been so named because this technique constructs ‘K’ partitions of data when provided with ‘n’ objects. Each partition represents a cluster and K ≤ n. So, data objects are classified into K groups such that each group contains at least one object and each object in the data set must belong to exactly one group. This partitioning technique constructs an initial partition on the data set and then iteratively relocates the objects among the partitions based on the criteria of the algorithm until optimum partition has been made. The following subsection discusses about the method adopted by K-means. The procedure adopts a simple method for partitioning a given set of data objects into ‘K’ number of clusters fixed a priori. The ‘K’ centres are to be chosen such that they are placed as far as possible from each other. Next, each object in the data set is assigned with the nearest centre. Once all the objects are assigned, initial partitioning is done. Then, K new centres are to be recalculated as centres of the new clusters based on cluster mean. After this, a new connection has to be laid down between all data set points and the nearest new centre. This generates the loop structure as a result of which K centres keep changing locations in every iteration. This procedure continues until the following objective function, also known as the squared error function given by (1) converges. E=
k
|d − m i |2
(1)
i=1 d∈Ci
where ‘E’ is the sum of the square error for all objects in the data set. ‘d’ is an object in space and ‘mi ’ is the mean of cluster ‘C i ’. So, for each of the object in every cluster, the distance from the object to its cluster centre is squared, and the distances are summed. This condition makes the resulting K clusters as compact and as separate as possible [21].
174
G. Sahoo and S. S. Sahoo
2.1 Algorithm for K-Means Clustering
Algorithm 1: K-means Algorithm Step 1: Specify the number of clusters K and Choose K objects from dataset D as the initial cluster centers (Centroids) arbitrarily without replacement. Step 2: Repeat Step 3: Compute the sum of the squared distance between data points and all centroids. Step 4: (Re)Assign each object to the cluster to which the object is the most similar, based on the mean value of the objects in the cluster. Step 5: Update the cluster means or the new centroid, i.e., calculate the mean value of the objects for each cluster. Step 6: Until no change (i.e. till assignment of data points to clusters change no more).
The plot specified in Fig. 1 [21] depicts the process of K-means clustering on a given data set based on the steps of Algorithm 1. The black dots represent the data. The irregular boundaries indicate the cluster boundaries. The cluster boundaries are initially shown as dashed boundary in Fig. 1a, because they keep changing in each iteration. Figure 1b represents the ongoing changes in the cluster boundaries. The cluster boundary in Fig. 1c is shown in solid lines to indicate that the solution has converged and the boundary no longer change with each iteration, that is, the data have been assigned to their respective appropriate clusters.
3 Proposed Model In this proposed methodology, values of three different measures, that is, bank transactions, specifically, the credit and debit of cashbook, asset value per month, and net profit per month for a period of 12 months (one fiscal year) have been considered. Each of these measures constitute a field in the data set. Every transaction in each of
Fig. 1 K-means-based clustering on a set of objects; cluster mean (centroid) has been indicated with a ‘+’ [21]
Accounting Fraud Detection Using K-Means Clustering Technique
175
Fig. 2 Flow depicting the steps followed in this work
the data set has been treated as an object for the purpose of applying clustering on them. The proposed workflow is provided in the following subsection.
3.1 Workflow The data set is initially imported as column vectors, one each for credit, debit, asset value and net profit. The number of clusters into which data is to be partitioned is decided prior to implementing the K-means model. The model is run on the imported data dividing it into clusters and finally the clusters are plotted for better depiction. The entire flow is depicted in Fig. 2. The following section discusses about the implementation and gives an overview of the result snapshots.
4 Implementation and Results The K-means clustering algorithm was run on the sample data set provided in Table 1. The data set has been curated focusing on the features considered in the work. The credits for the months of December–February have intentionally been assigned values at the higher side so as to show a higher overall profit for the organization. In other words, fraudulent entry has been done for these months to show that the organization is financially sound. The sample data set is as follows. Now, K-means has been applied on to this data set to check whether the adopted clustering technique can identify the fraudulent entries or not. The clustering was implemented on MATLAB R2014A and run on Windows 10 machine with an i3 processor. The value of K, that is, the number of clusters, was taken as 2, as we needed to group the given data into two sets, that is, either fraud or non-fraud data. The results show that all the three datasets (profit, credit–debit and asset value) considered have been partitioned into two clusters each, one indicating the fraudulent
176
G. Sahoo and S. S. Sahoo
Table 1 Sample data set of profit, credit debit (as in cashbook) and asset value in INR for demonstrating the use of K-means technique (assumed for one FY) Month
Profit (in INR)
Credit (in INR)
Debit (in INR)
Asset value (in INR)
April
30,000
40,000
10,000
100,000
May
20,000
150,000
130,000
98,333
June
0
20,000
20,000
96,666 99,000
July
25,000
50,000
25,000
August
5000
15,000
10,000
96,333
September
10,000
20,000
10,000
94,666
October
12,000
32,000
20,000
95,000
November
28,000
40,000
12,000
93,333
December
32,000
42,000
10,000
91,666
January
24,000
124,000
100,000
92,000
February
100,000
300,000
200,000
93,000
March
75,000
200,000
125,000
94,000
ones (indicated in red in the plot) and the other for the non-fraudulent ones (indicated in blue in the plot). In Figs. 3, 4, 5 and 6, data for a period of 12 months have been plotted and possible frauds have been indicated in red markers. On considering all
Fig. 3 Plot showing the clusters of possible fraudulent and non-fraudulent data in the net profit per month. The months are numbered from 1 to 12. X-axis represents the months and Y-axis represents the credit per month. Red dots indicate possible fraudulent entries
Accounting Fraud Detection Using K-Means Clustering Technique
177
Fig. 4 Plot showing the clusters of possible fraudulent and non-fraudulent data in the credit side of the cashbook. The months are numbered from 1 to 12. X-axis represents the months and Y-axis represents the debit per month. Red dots indicate possible fraudulent entries
Fig. 5 Plot showing the clusters of possible fraudulent and non-fraudulent data in the debit side of the cashbook. The months are numbered from 1 to 12. X-axis represents the months and Y-axis represents the net profit per month. Red dots indicate possible fraudulent entries
178
G. Sahoo and S. S. Sahoo
Fig. 6 Plot showing the clusters of possible fraudulent and non-fraudulent data in the asset value. The months are numbered from 1 to 12. X-axis represents the months and Y-axis represents the asset value
the plots together as shown in Fig. 7, which has superimposed the plot of all the four criteria, it can be seen that the red markers indicating the fraud seem to be aggregated in the region for the months of January–March. Thus, they are the most probable candidates for fraud, and it raises a suspicion that falsified data for the months of January–March may have been inserted for reflecting good financial health. Further in-depth analysis needs to be carried out on the facts and figures for the indicated months during audits to confirm whether a misappropriation or fraud has occurred or not. Thus, this shows that K-means is indeed useful in bringing the fraudulent data to the fore and a basic K-means can serve as the screening step for filtering out the fraud financial data in a financial report of an organization by auditors working on huge data.
5 Conclusion and Future Scope The main advantage of clustering over other data mining techniques is that it is quite flexible to changes. It weeds out the features that distinguishes one cluster from another. However, clustering technique has been widely used as a fraud detection technique in different sectors and even on financial statements, the simplicity and basic structure of K-means in ascertaining fraudulent financial statements have not been analysed. This work has shown that this technique is quite robust and it can be
Accounting Fraud Detection Using K-Means Clustering Technique
179
Fig. 7 Plot showing the superimposition of Figs. 3, 4, 5 and 6. X-axis represents the months indicated in numbers and Y-axis represents all the four criteria: net profit, credit, debit and asset value
thus effectively used as a prerequisite for accounting fraud detection. Moreover, it can be easily scaled to larger data sets. The only trouble is that K-means is dependent on initial values of centroids. This forms the future prospect of this work, that is, K-means is to be hybridized such that it runs with different initial values and the best one is picked, or even more advanced techniques like fuzzy c-means or GMM may be adopted for the purpose.
References 1. P.E. Johnson, S. Grazioli, Fraud detection: Intentionality and deception in cognition. Acc. Organ. Soc. 18(5), 467–488 (1993). https://doi.org/10.1016/0361-3682(93)90042-5 2. G.K. Agarwal, Y. Medury, Internal auditor as accounting fraud buster. IUP J. Account. Res. Audit Pract. 13, 7–13 (2014) 3. M. Tutino, M. Merlo, Accounting fraud: a literature review. Risk Governance Control Financ. Markets Inst. 9(1), 8–25 (2019). https://doi.org/10.22495/rgcv9i1p1 4. H. Issa, M.A. Vasarhelyi, Application of anomaly detection techniques to identify fraudulent refunds (2011) 5. E. Kirkos, C. Spathis, Y. Manolopoulos, Data mining techniques for the detection of fraudulent financial statements. Expert Syst. Appl. 32(4), 995–1003 (2007) 6. F.H. Glancy, S.B. Yadav, A computational model for financial reporting fraud detection. Decis. Support Syst. 50(3), 595601 (2011) 7. M. Jans, N. Lybaert, K. Vanhoof, Data mining for fraud detection: toward an improvement on internal control systems? (2008)
180
G. Sahoo and S. S. Sahoo
8. Z.M. Zhang, J.J. Salerno, P.S. Yu, Applying data mining in investigating money laundering crimes. in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2003), pp. 747–752 9. A. Sharma, P.R. Panigrahi, A review of financial accounting fraud detection based on data mining techniques. Int. J. Comput. Appl. 39. https://doi.org/10.5120/4787-7016 10. S. Wang, A comprehensive survey of data mining-based accounting fraud detection research. in 2010 International Conference on Intelligent Computation Technology and Automation, (Changsha, 2010), pp. 50–53, https://doi.org/10.1109/ICICTA.2010.831 11. X. Min, R. Lin, K-means algorithm: fraud detection based on signaling data. in 2018 IEEE World Congress on Services (SERVICES), (San Francisco, CA, 2018), pp. 21–22 12. R. Rambola, P. Varshney, P. Vishwakarma, Data mining techniques for fraud detection in banking sector. in 2018 4th International Conference on Computing Communication and Automation (ICCCA), (Greater Noida, India, 2018), pp. 1–5 13. W. Chang, J. Chang, Using clustering techniques to analyze fraudulent behavior changes in online auctions. in International Conference on Networking and Information Technology (ICNIT) (2010), pp. 34–38 14. G. Williams, R. Baxter, H. He, S. Hawkins, A comparative study of RNN for outlier detection in data mining. in Proceedings of ICDM02 (2002), pp. 709–712 15. A. Deshmukh, L. Talluru, A rule-based fuzzy reasoning system for assessing the risk of management fraud. Int. J. Intell. Syst. Account. Finance Manage. 7(4), 223–241 (1998) 16. M.J. Cerullo, V. Cerullo, Using neural networks to predict financial reporting fraud: part 1. Comput. Fraud Secur. 5, 14–17 (1999) 17. T. Bell, J. Carcello, A decision aid for assessing the likelihood of fraudulent financial reporting. Auditing J. Pract. Theory 10(1), 271–309 (2000) 18. S. Kotsiantis, E. Koumanakos, D. Tzelepis, V. Tampakas, Forecasting fraudulent financial statements using data mining. Int. J. Comput. Intell. 3(2), 104–110 (2006) 19. S. Thiprungsri, M. Vasarhelyi, Cluster analysis for anomaly detection in accounting data: an audit approach. Int. J. Digital Account. Res. 11 (2011) 20. Q. Deng, G. Mei, Combining self-organizing map and K-means clustering for detecting fraudulent financial statements. in IEEE International Conference on Granular Computing, GRC ’09 (2009), pp. 126–131 21. H. Jiawei, M. Kamber, Data Mining: Concepts and Techniques (Morgan Kaufmann Publishers, San Francisco, 2001). 22. V. Vaishali, Fraud detection in credit card by clustering approach. Int. J. Comput. Appl. 98, 29-32 (2014). https://doi.org/10.5120/17164-7225
Transforming the Lives of Socially Dependent to Self-dependent Using IoT Debabala Swain and Sony Snigdha Sahoo
Abstract Internet of Things (IoT) explores the new sensor-based wearable devices and technologies that can be used in day-to-day life. It has opened up a wide range of new dimensions in elderly care, disability care and patient care through real-time monitoring and life supports. This new paradigm has reformed the lifestyle of the socially dependent individuals for the better. It has helped set high standards in the healthcare domain by enabling smart and high technology devices. It has bestowed a self-dependent life to the differently-abled and elderly individuals. Another appealing aspect about IoT is such devices are being made cost-effective in terms of their development so that large number of users look up to these devices for daily usage and whole mankind takes maximum benefit out of the technology. IoT, indeed has got a lot to serve to the society on various aspects, only if mankind is ready to accept it wholeheartedly. This paper briefly summarizes the role of IoT in healthcare and life support and highlights its novel utilities for betterment of socially dependents. Features of some of the IoT solutions have also been discussed thoroughly for instantiating the care that can be imparted by these devices in transforming the socially dependent lives to self-dependent ones. Also, a feature comparison has been provided to stress on the fact that IoT devices can indeed be upgraded at a fast pace which is not to be abdicated but rather be availed for the greater good. Keywords IoT · Disability care · Elderly care · Smart devices
1 Introduction Internet of Things (IoT) represents the smart devices, capable of interacting with its surrounding world. They sense and input data through different sensors, convert D. Swain Department of Computer Science, Rama Devi Women’s University, Bhubaneswar, India S. S. Sahoo (B) Department of Computer Science and Applications, DDCE Utkal University, Bhubaneswar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_18
181
182
D. Swain and S. S. Sahoo
the electrical input into physical action using actuators and share that data through Internet for further processing and analytics [1, 2]. Due to wide range of connectivity with people through internet, it has gained a lot of popularity in consumer care, commercial applications, industrial purposes, healthcare sectors, life support and so on. Some instances of these IoT devices include wearable gadgets, healthcare monitoring devices, smart home devices, industrial automation, agricultural instruments and many such. To facilitate smart healthcare, nowadays people use smart wrist bands which can monitor heart pulse rate, blood sugar level, blood pressure levels, calorie loss and body movement activities throughout the day. To take optimum care for elderly, small kids and sick persons at home, the smart homes [3] devices like smart firewalls, smart cameras, precision cookers, room thermostats, air quality monitors, smart bulbs, smart door locks, smart watering systems, are used. Figure 1 explains the diversified applications of IoT. The rest of the paper focusses on how IoT is helping people, who are dependent on others because of any physical disability or old age or some terminal illness, to lead an independent life by equipping them with various IoT-based devices. Section 2 describes about various IoT solutions implemented in healthcare. Section 3 discusses how IoT promotes independent living. Section 4 briefs on stages of IoT-based architecture for data analysis in healthcare. Section 5 illustrates a feature comparison between few of the existing IoT devices and upcoming ones for upgrading lifestyle of socially dependents. Section 6 concludes this work with a brief description on future prospects of IoT devices.
Fig. 1 IoT network [4]
Transforming the Lives of Socially Dependent to Self-dependent …
183
2 IoT as Ultimate Healthcare Solution The healthcare industry has been rapidly adopting the smart devices as the ultimate solution. This IoT-enabled healthcare industry is emerging and evolving at a fast pace and has come to be termed as Internet of Medical Things (IoMT) [5, 6]. The IoT devices in such scenario facilitate remote health monitoring, like blood pressure and heart rate monitoring, pacemakers, electronic wristbands, advanced hearing aids and so on. They are also capable of sending emergency notifications. IoT can thus serve many purposes in healthcare industry for all its stakeholders, including patients, physicians, hospitals and corporates. The various usages are as follows [7]: (a) IoT for Patients: Nowadays patients use wireless wearable devices [8] like fitness bands, blood pressure and heart rate monitoring cuffs for personalized patient attention. Devices have also been customized for counting calorie loss, body activity reminders, physician appointments reminder, blood pressure alerts and so on. (b) IoT for Elderly: IoT has played a great role in elderly people’s lives by providing a constant track of their health conditions. People living alone can instantly notify their families without much ado. If there is a disruption or change in the personal routine activities then alert signals are sent to the family members and concerned caretakers [3]. (c) IoT for Physicians: Today’s smart physicians are keeping track of their remote patients’ health status more effectually by recommending wearable and home monitoring equipment with embedded IoT to them. They can easily track patients’ diet tracks, medicine intakes, treatment plans and emergency needs with immediate medical attention. (d) IoT for Hospitals: Smart hospitals have been looking towards advanced patient care system based on IoT applications as IoT devices can track the location of different sensor-based medical equipment like wheelchairs, nebulizers, oxygen pumps, various monitoring gear and environment control gears in real time. Even, how hygienic the hospital surroundings are can also be monitored using such devices, and thus, patient infection may be minimized or eliminated altogether. Few other roles that IoT devices have been playing include asset management such as pharmacy inventory control. (e) IoT for Healthcare Insurance: Customers are now being offered various rewards for sharing IoT-based data on routine activities and health with the insurers. Such data will enable insurance companies in validating the customer claims. The next section provides a brief account about how such devices altogether can help individuals, especially the elderly and sick ones, to lead an independent life.
184
D. Swain and S. S. Sahoo
3 IoT in Promoting Independent Living Various homecare solutions made up of IoT devices can help fight social isolation by letting seniors stay at the comfort of their homes, while their movements are being monitored by respective IoT devices, and family or caretakers are alerted of any irregularities or abnormalities. Ambient intelligence is a newly developed prototype under homecare solutions aiming at endowing people with digital environments that is responsive to human needs. They provide adults with disabilities or older adults who require assistance with assisted living care settings which include one or more activity of daily living.
3.1 Elderly Care A lot of instances can be found where elderly people have been forced into care homes because they are to be monitored on frequent basis which at times becomes troublesome. But with the advent of smart devices, not just blood pressure and heart rate, many other bodily measures can be easily monitored [9, 10]. Besides, IoT devices can also prevent fall, detect fall and locate an elderly person and initiate quick rescue calls apart from its routine work.
3.2 Persons With Disability (PWD) Care People who have suffered disability in some form face several barriers in day-to-day life. The Internet of Things (IoT) now aids in cleaning out such hurdles for disabled people. A number of innovations such as smart walking cane, smart wearable glasses, life support and rescue devices are examples of few such IoT devices for the disabled. Automatic rescue notification can also be sent to nearby relatives, police station or hospitals using these devices.
3.3 Personal Movement Monitoring Many a times, small kids, senior elders, PWD people need constant monitoring. IoT devices are used to monitor these dependent people to detect their movement issues, long-term illnesses and so on. The devices gather number of diverse data points through different sensors and process the collected data to evolve new models for developing preventive measures for the users [11].
Transforming the Lives of Socially Dependent to Self-dependent …
185
3.4 Improving Autonomy The smart IoT devices are much useful in helping disabled individuals to become more autonomous. IoT technology provides wearables, like smart watches, smart canes, smart glasses that translate the input object contents to voice, vibrations to recite the user. The persons can also easily get notifications of emails and texts messages or read the same aloud. By facilitating autonomy, IoT helps the disabled individuals to overcome both social and personal barriers. So their quality of living can be improved.
4 IoT-Based Architecture for Data Analysis The IoT products have propagated massively into the healthcare industry. The data generated by such devices may be analysed for identifying/mining patterns pertaining to an individual’s health. IoT can follow the four-step architecture explained in Fig. 2 for data capturing and processing [7]. • In the first step, the basic interconnected components, like sensors, actuators, monitors, detectors, camera systems, are deployed and the input data are collected through the sensors. • In the second step, the collected data in analogue form are aggregated and converted to the digital form for subsequent data processing. • In the third step, the digitized data are aggregated, pre-processed, standardized and moved to the data centre or cloud. • In the fourth step, the final data is managed and analysed using advanced analytic techniques for effective decision-making.
Fig. 2 IoT-based four-step architecture [7]
186
D. Swain and S. S. Sahoo
Table 1 Feature comparison in prior arts and proposed smart watch S. No.
Features
Prior art
Proposed device
1
SIM Slot
No
Yes
2
Pairing with Android phone
Yes
Yes
3
GPS navigation
Yes
Yes
4
Heart rate tracker
Yes
Yes
5
Chargeable
Yes
Yes
6
Battery life up to 7 days
Yes
Yes
7
Auto night time spent, sleep stage records and alerts
Yes
Yes
8
Water resistant
Yes
Yes
9
Rescue alert to nearby Hospital
No
Yes
5 Feature Comparison Between Prior Art and Proposed Approach In this section, possible feature upgradation to two smart IoT devices has been suggested as an instance for the socially dependents, that is elderly people and persons with visual impairment. Also, a feature comparison between earlier version and proposed version has been presented in order to emphasize on fast-paced improvement that IoT devices are subjected to. Although numerous devices are available, still enhanced features are proposed for relaying information of the dependents to nearby contacts/hospitals/police stations and for detecting overhead obstacles/fall detector for blind persons [12]. The features have been summarized in Tables 1 and 2.
6 Conclusion and Future Aspects Efficient, accessible and affordable IoT devices are always desirable for the healthcare of elderly individuals, for the visually impaired persons and for persons with other disability. This can be transformed into reality with the aid of wearable sensors and actuators. This paper presents some relevant literature to identify and analyse the common aspects of IoT and its usefulness in the society and healthcare domain. Primarily, it highlights on IoT towards elderly care and blind users through two innovative smart devices. The devices are planned for providing better life management tools in their smooth daily life. The devices are in terms of support, assistance, preventions from falls and providing instant rescue. The future work is to focus on development of the two proposed devices in an affordable price with durable and comfortable technology.
Transforming the Lives of Socially Dependent to Self-dependent …
187
Table 2 Feature comparison in prior arts and proposed smart cane S. No.
Features
Prior art
Proposed device
1
SIM Slot
No
Yes
2
Pairing with Android phone
Yes
Yes
3
GPS navigation
Yes
Yes
4
Heart rate tracker
Yes
Yes
5
Chargeable
Yes
Yes
6
Battery life up to 7 days
Yes
Yes
7
Auto night time spent, sleep stage records and alerts
Yes
Yes
8
Water resistant
Yes
Yes
9
Rescue alert to nearby Hospital
No
Yes
10
Light
No
Yes
11
Alarm
No
Yes
12
Ground obstacle detector
No
Yes
13
Overhanging obstacle detector
No
Yes
14
Fall detector
No
Yes
15
Self defence mode
No
Yes
16
Rescue alert to nearby Hospital/Family
No
Yes
17
Spy camera
No
Yes
Acknowledgements This work is funded by OSHEC, Department of Higher Education, Govt. of Odisha under the OURIIP scheme at Rama Devi Women’s University, Bhubaneswar, India.
References 1. S. Kuyoro, F. Osisanwo, O. Akinsowon, Internet of things (IoT): an overview. in 3rd International Conference on Advances in Engineering Sciences and Applied Mathematics, (2015), pp. 53–58 2. A. Whitmore, A. Agarwal, L. Da Xu, The internet of things-a survey of topics and trends. Inf. Syst. Front. 17, 261–274 (2015). https://doi.org/10.1007/s10796-014-9489-2 3. D. Pal, T. Triyason, S. Funikul, Smart homes and quality of life for the elderly: a systematic review. IEEE Int. Symp. Multimedia (ISM) 11–13, 413–419 (2017). https://doi.org/10.1109/ ISM.2017.83 4. https://pixabay.com/illustrations/iot-internet-of-things-network-3337536 5. H.H. Nguyen, F. Mirza, M.A. Naeem et al., A review on IoT healthcare monitoring applications and a vision for transforming sensor data into real-time clinical feedback. in IEEE 21st International Conference on Computer Supported Cooperative Work in Design (CSCWD), (2017), pp. 257–262. https://doi.org/10.1109/CSCWD.2017.8066704 6. H. Ahmadi, G. Arji, L. Shahmoradi et al., The application of internet of things in healthcare: a systematic literature review and classification. Univ. Access Inf. Soc. (2018). https://doi.org/ 10.1007/s10209-018-0618-4 7. https://www.wipro.com/en-IN/business-process/what-can-iot-do-for-healthcare-/
188
D. Swain and S. S. Sahoo
8. P. Kumari, L. Mathew, P. Syal, Increasing trend of wearables and multimodal interface for human activity monitoring: a review. Biosens. Bioelectron. 90, 298–307 (2017). https://doi. org/10.1016/j.bios.2016.12.001 9. P. Khosravi, A.H. Ghapanchi, Investigating the effectiveness of technologies applied to assist seniors: a systematic literature review. Int. J. Med. Inf. 85, 17–26 (2016). https://doi.org/10. 1016/j.ijmedinf.2015.05.014 10 I. Azimi, A.M. Rahmani, P. Liljeberg et al., Internet of things for remote elderly monitoring: a study from user-centered perspective. J. Ambient Intell. Humanized Comput. 8, 273–289 (2017). https://doi.org/10.1007/s12652-016-0387-y 11. C. Ni Scanaill, S. Carew, P. Barralon et al., A review of approaches to mobility telemonitoring of the elderly in their living environment. Ann Biomed Eng 34, 547–563 (2006). https://doi. org/10.1007/s10439-005-9068-2 12. S. Brownsell, M. Hawley, Fall detectors: do they work or reduce the fear of falling? Hous. Care Support 7, 18–24 (2004)
Enforcement an Evidence and Quality of Query Services in the Cost-Effective Cloud G. Vijendar Reddy, Shaik Arshia Zainab, Sathish Vuyyala, and Nagubandi Naga Lakshmi
Abstract Cloud outburst is dangerous as information gets leaked, thereby allowing sensitive data to a bunch of trusted agents to an unauthorized place of location. Security practitioners are always interested in data cloud leakage issues that arise from various sources, like e-mail correspondence and different network channels. In this paper emphasis is laid on how to assess the probability that the leaked information came from one or more agents. Thus, the proposed system can identify those parties who are guilty of such cloud leakage even if the information is altered. Data allocations are used by the system for this purpose which can induce data records that are realistic but not fake and to improve the identification of cloud leakage through e-mails. Filtering of these e-mails is done by blocking e-mails that contain pictures, videos or sensitive data in a corporation. Classification of e-mail is primarily done on the basis of fingerprints of message bodies. The white and black listing of mail addresses is the principle used in e-mail filtering and therefore converting the words specified to spam. Keywords Cloud computing · Cost efficiency · Differential query services · Privacy
1 Introduction Cloud computing is an era of today’s world which uses robust infrastructure having scalability and price-saving techniques. Cloud uses effective features such as security, infinite storage, low-cost and multiple access to users for various files applications. The users within the cloud save a lot of time using the server as it eases the workload query services, and also the system is extremely dynamic. In addition to the safety G. Vijendar Reddy (B) · S. A. Zainab · N. N. Lakshmi Department of Information Technology, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India e-mail: [email protected] S. Vuyyala Department of Computer Science Engineering, MVSR Engineering College, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_19
189
190
G. Vijendar Reddy et al.
feature which is increased, the information confidentiality and query privacy is also maintained under cloud. The organizational information are usually leaked through e-mails. Therefore, to secure the information and query privacy, new approaches are needed which is the need of the hour; of course, there is a possibility of slow query processing. The CPEL criteria handle within the cloud the information confidentiality, query privacy, efficiency in query processing and low in-house working cost, thereby increasing the complexity of query services. In this paper some related techniques are discussed to identify some aspects of the matter. For example, the order preserving encryption and crypto-index [1] techniques are susceptible to the attacks. Enhanced crypto-index technique gives heavy load on the in-house infrastructure to reinforce the knowledge security and query privacy. Cloaking boxes are used by the New Casper approach [2] to secure data objects and query, which affects query processing efficiency and also the in-house workload. This paper develops a sample that can be used to estimate the “guilt” of agents which includes the technique of adding fake objects to the distributed set. An agent is given one or more fake objects [3], then leaking is determined, and the identification of guilty agent becomes high. In this paper technique is worked out for calculating the probability of an agent being guilty in case of data leakage. The next section discusses an approach for data allocation agents. Lastly, we analyze the techniques related to various data leakage situations, and identify if they help in recognizing the leaker. Cloud consists of query services, like practical range query and k-nearest-neighbor (k-NN) which can be constructed with the help of proposed random space perturbation (RASP) technique. The four areas of CPEL criteria [3] will be satisfied by the proposed system. It is possible to transform the multidimensional data by using encryption [4], random sound injection and projection using RASP technique [5]. The RASP technique and its integration mainly throw light on ensuring data confidentiality, securing the multidimensional range of queries and processing the query with indexing. The stored data from the database is retrieved by using the range of query [6]. The upper- and lower-class bounds are used to extract the info. The closest record to the query is determined with the help of k-nearest-neighbor query [7]. The proposed system also uses the e-mail filtering technique to group the mails that are sent to the unauthorized users. Determining the guilty agents is the main objective of the proposed work.
1.1 Related Data The concepts taken into consideration for the proposed work are as follows: A. Order Preserving Encryption OPE stands for order preserving encryption which is a deterministic encryption scheme, also known as ciphers. The encryption function of OPE is used to
Enforcement an Evidence and Quality of Query Services …
B.
C.
D.
E.
191
preserve numerical ordering of plaintexts. It uses encrypted table as a base for the database indexes to be developed over it. The encryption key is very big and implementation consumes extra time and space which is the major drawback of this technique. Crypto-index Technique Crypto-index technique [8] has a strong working system which includes complicated techniques that ensure secured encryption and security. However, this technique is vulnerable to attacks. This technique includes an approach called New Casper approach which is involved in securing information and query; however, by doing so, the effectiveness of the query process will be affected. Distance Recoverable Encryption This technique mainly focuses on maintaining the relationship between the closest neighbors. It is suggested that dot products can be preserved rather than distances in order to determine k-nearest-neighbor which helps in better recovery from distance-oriented attacks. The drawback of this technique is that the search algorithm is finite or bounded in case of linear scan, and indexing techniques are not used. Preserving Query Privacy Papadopoulos [9] uses PIR techniques to upgrade the security of the location. However, data confidentiality is not maintained by the technique. Space Twist [10] proposed a method that can be used to query k-NN by using a duplicate user’s location in order to preserve the privacy of the location. But this method lacks the consideration of information confidentiality. The Casper technique considers both query privacy as well as information confidentiality. Multidimensional Range Query This technique demands the owner of the information to produce indices and keys for the server. This helps in ensuring that only the authorized users will be able to approach the information on the server. In case of cloud database, the cloud servers occupy the higher authorities of indexing and processing the query.
2 Related Works Our paper work goal is to provide different query services while protecting user privacy. To the best of our information, no earlier works have addressed this difficulty. The present research like ours can be found in the areas of confidential searching and ranked searchable encryption. Personal searching is proposed by the authors, where the data is stored in the clear form, and the query is encrypted with the Paillier cryptosystem. The cloud stores all files into a compacted buffer, with which the user can successfully recover all wanted files with high possibility. In the following effort, the communication cost is reduced by solving a set of linear programs. This paper presented an efficient decoding mechanism for private searching. The main drawback of the current private searching techniques is that both the computation
192
G. Vijendar Reddy et al.
and communication costs grow linearly with the number of users that are executing searches.
3 Proposed System The proposed work is to recognize those parties which are accountable for cloud leakage and at the same time ensure that the sensitive data is protected from any unauthorized access. The work also focuses on providing data confidentiality and ensuring privacy of the query. The RASP technique is meant for ensuring the confidentiality and the query privacy. The proposed system utilizes the e-mail filtering technique. This technique groups the leaked data by restricting the mails that consist of private info, videos and pictures of an organization. Now, consider the below-mentioned modules of the project whose aim is to achieve the results expected from the proposed system which in turn aims at overcoming the drawbacks of the existing system while providing support for future enhancements. There are six modules as follows: Figure 1 describes the structure of the proposed system. All login information is maintained by the server, thereby leading to the formation of a cloud information which contributes in providing valid authorization for accessing or retrieving critical information on the cloud. Server maintains automatically all IP and MAC-related addresses of the users during the registration process; at the same time, it pertains to mention that these addresses cannot be used to identify clone node information [11]. A. Query Authentication In server section, the MAC and IP addresses of the authorized clients are used to add them. During this section the server stores the log of every query that is processed. Performance of RASP is going to be restricted during this section. The clone node logs that are detected will be stored during this section. The administrator is in charge of this process. IP and MAC addresses of each client will be given by them for registration purpose. Only the authenticated client has the right to transfer the information. Each authorized client must possess private IP address and MAC address. This particular address is engaged in spotting the clone node detection. B. Securing the Data with False Objects There will be some dummy data sent by the server node along with the original data. This fake data will be unnoticed by the clone. The owner of the node will be aware of the number of fake or dummy objects injected into the original or actual data. The agent gets the complete data object which fulfills the constraints of the agents’ request. The e-optimal algorithm lowers every term of the target summary by inserting the maximum number of pretend data to every set, giving rise to optimal solution. The e-optimal solution is:
Enforcement an Evidence and Quality of Query Services …
193
Fig. 1 The system architecture
O(n + n2F) = O(n2F) where n = number of agents; F = number of pretend objects. The current module consists of additional data object than what the agents request in total. The more the number of recipients that on an average an object possesses and more the number of objects used among various agents, the more difficult it becomes to identify an agent that is guilty. In the above discussed algorithm, the agent acquires only the subset or part of data object that can be received by an agent. The sample data request algorithm works on the similar grounds as that of the explicit data request. C. High-level Query Processing The security and privacy of the information access pattern is maintained by the private information retrieval (PIR), while the info may not be encrypted. The PIR scheme [12] is usually very expensive. In order to maintain the efficiency, PIR
194
G. Vijendar Reddy et al.
relies on a pyramid hash index so as to perform effectively in privacy-preserving data-block implementation supported with the conception of oblivious RAM. It looks after the query privacy concerns and requires the authorized query users, the owner of the info, and the cloud to collaborative process of k-NN queries. Many computing operations are carried out within the user’s local system while interacting with the cloud server. D. Distribution of Sensitive Data Some amount of sensitive data has been given to a set of supposedly trusted agents by a data agent. Some amount of data leakage is observed in certain unauthorized places. The probability of data leakage [13] that occurred from one or more agents must be calculated by the agent. E. Clone Node Formation When the client sends the data to an unauthorized person then the clone node is detected. The clone node remains unaware of the fake or dummy objects that were created by the server. F. E-mail Filtering with Organization Sensitivity The current module concentrates on filtering the data related to the e-mail that has been shared with the clone node. The module includes six steps: 1. 2. 3. 4.
Determine the info. Eradication of the stopping words such as this, is, a, etc. Delete or alter the synonyms. Discover the priority of the word based upon the sensitivity or privacy of the info. 5. Compare and contrast the data with predefined company data sets. 6. Group the info with company’s important data sets. G. RASP It protects the privacy of query services. Practical range query and k-nearestneighbor (k-NN) query services present in the cloud can be constructed with the help of the proposed random space perturbation (RASP) technique. The four areas of CPEL criteria will be satisfied by the proposed work. RASP technique also transforms [14] data with a mix of order preserving encryption, random sound insertion and projection. The RASP method and its integration portray the knowledge or data confidentiality and preserve the multidimensional or ndimensional range of queries and effective execution [15] of the query along with indexing. The stored data from the database will be retrieved with the help of range query. The upper and lower bounds are used to extract the knowledge or data. K-nearest neighbor (k-NN) query is used to hunt or trace out the most close record to query point. H. k-NN-R Algorithm k-NN or the k-nearest-neighbor algorithm digs for the closest k samples within a specific range, which is closest to the query point. k-NN algorithm uses the subsequent steps to seek out the closest neighbor. • Select a parameter, say k, which represents the count of nearest neighbors.
Enforcement an Evidence and Quality of Query Services …
195
• Determine the distance between the elements and the query point. • Sort the gap. • Determine the closest neighbor having minimum distances within the parameter k. • Choose the bulk of the category of nearest neighbors.
4 Implementation The objective of the proposed system is the design phase, which contains description in brief of the work done. The design phase covers confidentiality and security aspects of data which includes filtering of sensitivity of data protection from unauthorized user. Design phase helps in giving clear view of the proposed paper. Six modules of the architecture of design phase are already covered in the previous section. The e-mail filtering technique is used to filter the e-mail once the clone node detection is over. The principle used in e-mail filtering techniques is blocking the information that contains company’s videos, pictures and so on. Admin reserves the privilege rights to view all the data transformation in addition to e-mail alerts received [16].
5 Conclusion We finally propose that the RASP approach satisfies CPEL criteria of handling confidentiality of information and privacy issues of query besides efficient processing. This approach identifies which part of the intermediate datasets must be encrypted so as to avoid wasting the privacy-preserving cost. RASP perturbation is the combination of order preserving encryption (OPE), dimensionality expansion, random projection and random noise injection, to provide the safety feature of data. It is observed that the present proposed technique increases the identification of cloud leakage and provides the safety feature to the cloud data. In addition, e-mail filtering technique employed provides required shield to the sensitivity of the data from unauthorized access.
References 1. A. Kumar, Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. Int. J. Cloud Appl. Comput. (IJCAC) 9(3), 22–36 (2019) 2. A. Kumar, S. Srivastava, Object detection system based on convolution neural networks using single shot multi-box detector. Procedia Comput. Sci. 171, 2610–2617 (2020) 3. A. Kumar, S.S.S.S. Reddy, V. Kulkarni, An object detection technique for blind people in realtime using deep neural network. in 2019 Fifth International Conference on Image Information
196
4.
5. 6. 7.
8. 9. 10. 11. 12. 13. 14. 15. 16.
G. Vijendar Reddy et al. Processing (ICIIP), (Shimla, India, 2019), pp. 292–297, https://doi.org/10.1109/ICIIP47207. 2019.8985965 A. Kumar, A review on implementation of digital image watermarking techniques using LSB and DWT. in The Third International Conference on Information and Communication Technology for Sustainable Development (ICT4SD 2018), (Hotel Vivanta by Taj, Goa, India, August 30–31, 2018) K. Chen, L. Liu, and G. Sun, Towards attack-resilient geometric data perturbation. in SIAM Data Mining Conference, (2007) B. Chor, E. Kushilevitz, O. Goldreich, M. Sudan, Private Information Retrieval. ACM Comput. Surv. 45(6), 965–981 (1998) R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, Searchable symmetric encryption: improved definitions and efficient constructions. in Proceedings of the 13th ACM Conference on Computer and Communications Security, (New York, NY, USA: ACM, 2006), pp. 79–88 H. Hacigumus, B. Iyer, C. Li, and S. Mehrotra, Executing sql over encrypted data in the database-service-provider model. in Proceedings of ACM SIGMOD Conference, (2002) B. Hore, S. Mehrotra, G. Tsudik, A privacy-preserving index for range queries. in Proceedings of Very Large Databases Conference (VLDB), (2004) F. Li, M. Hadjieleftheriou, G. Kollios, L. Reyzin, Dynamic authenticated index structures for outsourced databases. in Proceedings of ACM SIGMOD Conference, (2006) D. Raman, V.C. Sekhar, Monitoring the load due to effect of Packet droppers and modifiers. Int. J. Sci. Res. (IJSR), 2(2), (February 2013). ISSN: 2319-7064 K. Liu, C. Giannella, H. Kargupta, An attacker ’s view of distance preserving maps for privacy preserving data mining. in Proceedings of PKDD, (Berlin, Germany, September 2006) R. Marimont, M. Shapiro, Nearest neighbor searches and the curse of dimensionality. J. Inst. Math. Appl. 24, 59–70 (1979) D Raman, B. Krishna Ensuring security services for data storing and data sharing in cloud computing. Int. J. Sci. Res. (IJSR) 2(2), (February 2013). ISSN: 2319-7064 D. Raman, Y.S. Reddy, A square root topologys to find unstructured peer-to-peer networks. Int. J. Comput. Sci. Manage. Res. (IJCSMR) 2(3), (March 2013). ISSN: 2278-733X R. Dugyala, N.H. Reddy, S. Kumar, Implementation of SCADA through cloud based IoT devices—initial design steps. in 2019 Fifth International Conference on Image Information Processing (ICIIP), (Shimla, India, 2019), pp. 367–372
A Machine Learning Approach Towards Increased Crop Yield in Agriculture Shikha Ujjainia, Pratima Gautam, and S. Veenadhari
Abstract Machine learning is incontestably among the strongest and powerful technology in the world. It is a tool for turning data into knowledge. In the past 50 years, there has been an explosion of data. This mass of data is inefficient unless it is explored by us and the patterns are found. Machine learning methods are used to locate the underlying patterns in data that we would otherwise struggle to discover. The hidden patterns and comprehension about a problem may be used to forecast future events and execute all sorts of decision-making. This paper is dedicated to the applications of machine learning in the agricultural production system where different machine learning techniques like linear regression, ensemble method, and decision tree are applied to predict crop yield production by using favorable weather conditions. Keywords Machine learning · Linear regression · Ensemble method · Decision tree
1 Introduction Several years ago, organizations, systems, or applications were made on the basis of structured data. The usage of relational database system was considered as the easiest way to store, manipulate, and retrieve this data. But today’s world is running very fast and things are changing accordingly [1]. Today is just a period of technology, where the nature of data is changed. Moreover, organizations, systems, or applications are generating a vast amount of data in a variety of formats at a very fast rate. Other than this, social media, banks, instruments, websites, stock market, healthcare, agriculture domain, and so on, and various sources are responsible for large data generation. “Big Data” has three main characteristics: volume, velocity, and variety, where volume implies a large amount of data; velocity signify the rate at which data is getting generated; and variety includes different types of data like structured data S. Ujjainia (B) · P. Gautam · S. Veenadhari Rabindranath Tagore University, Bhopal, Madhya Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_20
197
198
S. Ujjainia et al.
(e.g. MySql). Due to their latent capacity, big data is considered as revolution that will change our thinking, work, and, of course, lives [2, 3]. The capacity to extract value in big data is determined by data analytics, and Jagadish et al. [4] believe analytics to be the center of the big data revolution. There is an urgent need to harvest food for population—with less land to grow it. Farming is going through a digital revolution. Industry pioneers and policymakers are looking for help from technology compel like data analytics, IoT, and cloud computing, to counter the pressures of climate change and food demand. IoT devices assist in the very first phase of the process—information gathering. To gather realtime data, sensors are fitted into trucks and tractors in addition to plants, soil, and fields [5]. Analysts integrate the large amount of accumulated information, including pricing models and weather information. The capacity is to monitor objects and collect realtime information, since the scope is big for information programs, and maybe only the tip of this iceberg and forecast scenarios are sometimes a real game-changer in farming methods [6].
2 Literature Survey Ip et al. [7] provided a research analysis in order to protect crop or control weed using big data. With the analysis of different machine learning techniques, they successfully analyzed such research. In order to show herbicide obstruction using a Markov random field model, they performed contextual analysis. Annual herbicide resistance events on a set of explanatory variables have been proposed and modeled taking into account the spatial component. The outcome exhibited that the proposed autobinomial model allows easy interpretation, similar to the widely used logistic regression model. A machine learning technique is used by Gandhi et al. [8] for the rice crop prediction. Various weather parameters and crop parameters are considered for prediction. Various machine learning algorithms such as naïve Bayes, BayesNet, and multilayer perceptron are used and compared with the SMO classifier that has been used earlier on the same dataset. Model performance is achieved through various parameters, like root mean square error, mean absolute error, relative absolute error, and so on. Sellam et al. [9] calculated a linear regression model to analyze the relationship between dependent variables like food price index, area under cultivation, and annual rainfall to the dependent variable yield. The least square fit model is used as it can fit polynomial as well as linear regression. The paper presented the following steps such as computing linear regression model, calculating residual values, and the sum of square to acquire R2 and implementation of the model. R2 is used to measure the accuracy of their model. Tenzin et al. [10] used climate smart agriculture system, which is a technique for examining the weather of the region and developing the products according to the climate of that region. This will help farmers to grow the right kind of crops in the
A Machine Learning Approach Towards Increased …
199
required land by knowing the precipitation, maximum temperature, and minimum temperature of that area. Zingade et al. [11] focus on the prediction of the most profitable yield that can be grown in the agricultural land using machine learning techniques. This paper includes the use of an android system that will give real-time crop analysis using various weather station reports and soil quality. Thus farmers can grow the most profitable crop in the best suitable months.
3 Machine Learning in Agriculture Let us discuss some use cases where a difference can be made by machine learning to nourish such a population, which is a significant challenge to the authorities. It also put hands together to figure out this problem. One way to eliminate this issue is to increase anyhow the yield from the existing farms [12, 13]. To solve this issue, big data provides a platform for farmers using information. This enables them to make the decision, like crop to plant for superior adulthood and when to harvest. Correct decisions improve farm yields. Yield prediction is yet another task, but yield forecasting looks at using mathematical models to examine data around biomass indices, and yield, chemistry, weather, leaf, utilizing the power and reducing data. Using sensors to collect data means that every business needs only a small amount of manual work to do on the instruction manual to guarantee the best return from their crops. Predicting yields in this way can make it easier for the farmer to find out what and where to plant it. Agricultural equipment optimization ensures the long-term health of farm equipment. Several companies have integrated sensors in their agricultural machines with big data applications setup [14]. Appropriate use of pesticides has been an issue because of its side effects on the environment. Big data enables farmers to handle this better by advocating what pesticides to use, and by how. Food supply management is another issue as we know that one-third of the grain produced for human consumption is wasted or destroyed every year. There is a huge difference between the supply and demand chains.
4 Proposed Methodology Several types of research have been conducted to improve productivity in the agricultural sector using machine learning with available resources, like water, seeds, pesticides, temperature, humidity, and so on. This research conducts comparison of various machine learning algorithms, such as multiple linear regression, decision tree, and ensemble methods like random forest and gradient boosting algorithm to verify which method will work efficiently on the given dataset.
200
S. Ujjainia et al.
4.1 Processing Dataset For this study, statistical information is collected from various sources such as kaggle.com and worldweatheronline.com. We have collected different datasets like rice crop dataset, temperature dataset, and rainfall dataset separately. Then we merged and clean that data by following appropriate preprocessing steps, and suitable parameters are selected by the feature selection method. The dataset is now ready to feed in the model and process it.
4.2 Proposed Workflow See Fig. 1.
Fig. 1 Proposed workflow of the model
A Machine Learning Approach Towards Increased …
201
4.3 Multiple Linear Regression Regression models are used to portray the connection between different variables by fitting a line to the observed data. Regression permits you to evaluate how the dependent variable changes as the independent variable(s) change. So far, we have seen the concept of simple linear regression where a single independent variable X was used to show the response of the dependent variable Y. In numerous applications, there is more than one variable that influences the response. Multiple linear models subsequently portray how a single target variable Y depends linearly upon various independent variables. The crop yield is dependent on multiple independent parameters, like temperature, rainfall, and area, so that it becomes compulsory to make use of multiple linear regression (MLR). The mathematical representation of multiple linear regression is: Y = a + (b ∗ X 1 ) + (c ∗ X 2 ) + (d ∗ X 3 ) + e.
(1)
where Y is the target variable, X 1 , X 2 , and X 3 are independent variables; a shows intercept, whereas b, c, d are used for regression coefficient and e is for residual (error). As compared to our model, the equation can be rewritten as: Production = a + b ∗ (Temp) + c ∗ (Rain) + d ∗ (Area) + e.
(2)
Since we are calculating yield production, so we have to take that as a target variable, and it is dependent on three independent variables in the dataset used, such as temperature, rainfall, and area, which will be further multiplied by regression coefficient b, c, and d, respectively. e = y − yˆ
(3)
The residual (e) can be determined by subtraction of the expected value (πˆy) of the dependent variable from the observed value (y) of the dependent variable. Then the R2 (shows goodness-of-fit for model) value is calculated to measure how close the data are fitted to the regression line. Its value lies somewhere in the range of 0 and 1. The R2 value after applying this model to our dataset is 0.75.
4.4 Decision Tree Regression Using decision tree we can decide a tree structure using a flowchart. In the decision tree we break down our data by making a decision based on a series of predicted answers. Decision tree regression is a nonlinear regression technique and is used to predict the target variable whose values are continuous in nature. Here, we have considered entropy function and information gain as metrics to select the root node.
202
S. Ujjainia et al.
This metrics allows to select the attribute that has the ability to classify the training data, and this attribute will be used for root of the tree. This process needs to be repeated for each brunch. Let us consider entropy function H(S), where S is the current dataset. Entropy = −
n
Pi ∗ Log2(Pi)
(4)
i=1
where P is the probability. Entropy is calculated for each remaining attribute. The smallest entropy is effective for splitting the dataset S. A decision tree can be used to predict both categorical and continuous target variable and is termed as classification and regression tree (CART) algorithm. The essential steps of the CART algorithm applied to our dataset are: (a) Determine how to divide the observations by calculating the sum of squared residual at each point. (b) The smallest sum of squared residual becomes a candidate for the root of the tree. (c) If we have more than one predictor variables (as we have in our dataset like temperature, rainfall, area, season etc.) then pick the candidate with the smallest sum of squared residuals to be the root. (d) At the point where we do not have exactly some minimum number of observations in a node, this node turns into a leaf node. (e) Keep repeating the process until the last leaf node comes.
4.5 Random Forest Algorithm Random forest uses the bagging technique where every tree builds and runs parallel without any interaction. Random forest uses many decision trees and merges their predictions to get more accurate outcomes. The implementation of the random forest can be seen in the following steps: (a) First create the bootstrapped dataset from the original dataset. The bootstrapped dataset has same size in respect to the original one created by selecting samples randomly from the original dataset. (b) Now create a decision tree with the help of the bootstrapped dataset, but only use a random subset of variables (or column) at each step. With the presence of a high entropy, the randomness of a variable is increased. (c) Step 1 needs to be repeated and we have to make a latest bootstrapped dataset and build a tree with the presence of subset of variables in each step, until a wide variety of trees is not collected. (d) After this, voting is done for each decision tree to make accurate predictions.
A Machine Learning Approach Towards Increased …
203
In this study, the considered dataset are collections of temperature, rainfall, area, the season for build random forest, which is the collection of multiple decision trees by considering two-thirds of the records in the dataset.
5 Experimental Analysis and Results The analysis is done using the python tool, which is an open-sourced OOPS language with high-level programming capability. The algorithms are used to compare based on R2 , RMSE, and the cross-validation of RMSE values. n yˆi − yi 2 RMSE = . n i=1
(5)
(Predicted values—actual value of variables) and n is the number of observations. The measured R2 value of the random forest model is 0.98 whereas the decision tree and linear regression model predicting values are 0.82 and 0.76, respectively.
Fig. 2 Measured R2 values of three different models
204
S. Ujjainia et al.
Fig. 3 Root mean square values of three different models
Fig. 4 Cross-validation through RMSE of three different models
Figures 2, 3 and 4 depict the performance of the three selected models of our dataset.
6 Conclusion This paper is concluded by determining whether a machine learning algorithm could offer new insights that can predict crop yield production when multiple factors such as weather parameters, soil parameters, and various deceased parameters can affect
A Machine Learning Approach Towards Increased …
205
the crop yield production. We compared here three regression-based algorithms to determine the best suitable model for our dataset to predict crop yield production. When the dataset was tested, the minimum RMSE value was found by a random forest regression algorithm. Thus, the random forest model was quite successful at predicting rice crop with the actual values of R2 at 98%.
References 1. M.A. Beyer, D. Laney, The Importance of ‘Big Data’: A Definition (Gartner, Stamford, CT, 2012). 2. V. Mayer-Schönberger, K. Cukier, Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt 179, 1143–1144 (2013) 3. A.L. Samuel, Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3, 210–229 (1959) 4. H.V. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J.M. Patel, R. Ramakrishnan, C. Shahabi, Big data and its technical challenges. Commun. ACM 57, 86–94 (2014) 5. J.W. Kruize, J. Wolfert, H. Scholten, C.N. Verdouw, A. Kassahun, A.J. Beulens, A reference architecture for Farm Software Ecosystems. Comput. Electron. Agric. 125, 12–28 (2016) 6. J. Gantz, D. Reinsel, Extracting value from Chaos. IDC Iview. 1142, 1–12 (2011) 7. R.H. Ip, L.M. Ang, K.P. Seng, J.C. Broster, J.E. Pratley, Big data and machine learning for crop protection. Comput. Electron. Agric. 151, 376–383 (2018) 8. N. Gandhi, L.J. Armstrong, O. Petkar, A.K. Tripathy, Rice crop yield prediction in India using support vector machines.in 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), (2016), pp. 1–5 9. V. Sellam, E. Poovammal, Prediction of crop yield using regression analysis. Indian J. Sci. Technol. 9, 1–5 (2016) 10. S. Tenzin, S. Siyang, T. Pobkrut, T. Kerdcharoen, Low cost weather station for climate-smart agriculture. In: 9th International Conference on Knowledge and Smart Technology (KST), (2017), pp. 172–177 11. Zingade, D.S., Buchade, O., Mehta, N., Ghodekar, S., Mehta, C.: Crop prediction system using machine learning. Int. J. Adv. Eng. Res. Dev. Spec. Issue Recent Trends Data Eng. 4 1–6 (2017) 12. K. Kaur, Machine learning: applications in Indian agriculture. Int. J. Adv. Res. Comput. Commun. Eng. 5, 342–344 (2016) 13. P.S. Cornish, A. Choudhury, A. Kumar, S. Das, K. Kumbakhar, S. Norrish, S. Kumar, Improving crop production for food security and improved livelihoods on the East India Plateau II. Crop options, alternative cropping systems and capacity building. Agric. Syst. 137, 180–190 (2015) 14. J. Gantz, D. Reinsel, The digital universe decade-are you ready? IDC Rev. 925, 1–16 (2010)
Resume Screening Using Natural Language Processing and Machine Learning: A Systematic Review Arvind Kumar Sinha, Md. Amir Khusru Akhtar, and Ashwani Kumar
Abstract Curriculum vitae or resume screening is a time-consuming procedure. Natural language processing and machine learning have the capability to understand and parse the unstructured written language, and extract the desired information. The idea is to train the machine to analyze the written documents like a human being. This paper presents a systematic review on resume screening and enlightens the comparison of recognized works. Several techniques and approaches of machine learning for evaluating and analyzing the unstructured data have been discussed. Existing resume parsers use semantic search to understand the context of the language in order to find the reliable and comprehensive results. A review on the use of semantic search for context-based searching has been explained. In addition, this paper also shows the research challenges and future scope of resume parsing in terms of writing style, word choice and syntax of unstructured written language. Keywords Machine learning · Natural language processing · Resume parser · Semantic search · Unstructured written language
1 Introduction Resume parsing is the process that extract information from websites or unstructured documents using complex patterns matching/language analysis techniques. It is a means to automatically extract information from resumes/unstructured documents and to create a potential database for recruiters. This process generally converts freeform of resumes, that is, pdf, doc, docx, RTF, HTML into structured data such as XML or JSON. Artificial intelligence technology and natural language processing (NLP) engine are used to understand human language and automation. Resume parsers use A. K. Sinha (B) · Md. Amir Khusru Akhtar Usha Martin University, Ranchi, India e-mail: [email protected] A. Kumar Vardhaman College of Engineering, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_21
207
208
A. K. Sinha et al.
semantic search to parse data from available resumes and find suitable candidates. The process of extracting human language is a difficult because human language is infinitely varied and ambiguous. A human language is written and expressed in several ways; thus parsing tool need to capture all the ways of writing by using complex rules and statistical algorithms. Ambiguity comes when the same word can mean different in different contexts. For example, a four-digit number may be a part of telephone number, a street address, a year, a product number or version of a software application. Thus, the idea is to train the machine to analyze the context of written documents like a human being. Recruitment agencies use resume parsing tools to automate the process and to save recruiters hours of work. Resume parser automatically separates the information into various fields based on the given criteria. The relevant information extracted by a resume parser includes personal information (such as name, address, email), experience details (such as start/end date, job title, company, location), education details (such as degree, university, year of passing, location), hobby (such as dancing, singing, swimming) and so on. There are numerous choices for resume parsers such as Sovren, Textkernel, Rchilli, BurningGlass, Tobu, JoinVision CVlizer, Daxtra, HireAbility, RapidParser and Trovix [1]. Most companies use applicant tracking system which bundles resume parser as one of the features. The first resume parsers were used in the late 1990s as a stand-alone packaged solution for HR operation [2]. This paper presents a systematic review on resume screening and enlightens the comparison of recognized works. Several techniques and approaches of machine learning for evaluating and analyzing the unstructured data have been discussed. Existing resume parsers use semantic search to understand the context of the language in order to find the reliable and comprehensive results. A review on the use of semantic search for context-based searching has been explained. In addition to that, this paper also shows the research challenges and future scope of resume parsing in terms of writing style, word choice and syntax of unstructured written language. The rest of the paper is organized as follows. Section 2 discusses information extraction methods. Section 3 presents a systematic review on resume parsers and enlightens the comparison of recognized works. Section 4 discusses the use of semantic search for context-based searching. Section 5 presents the research challenges and future scope of resume parsing. Finally, Sect. 6 concludes the paper.
2 Information Extraction Methods Information extraction is the method of extracting definite information from textual sources. The textual information is divided into sentences called sentence segmentation or sentence boundary detection [3]. The rule-based approach [4] for segmentation uses a list of punctuation symbols such as ‘.’, ‘?’, ‘;’ but this approach fails when it encounters abbreviations like ‘e.g.’, ‘etc.’, ‘n.d.’ and so on. In order to classify punctuation marks carefully supervised machine learning technique was proposed. It uses decision tree to mark the sentence boundaries and classification of punctuation
Resume Screening Using Natural Language Processing …
209
symbols [5]. The supervised machine learning approach requires huge corpora for training and needs specific knowledge of abbreviations [6]. Kiss and Strunk proposed unsupervised machine learning approach that uses type-based classification. In this method a word is analyzed in the whole text and annotated in sentence boundary and abbreviation annotation [6]. After segmentation of sentence boundaries, the system divides the sentence into tokens called tokenization. Several tokenization approaches have been proposed in the literature such as rule-based and statistical approaches. A rule-based tokenizer approach uses a list of rules for classification of tokens such as Penn Tree Bank (PTB) tokenizer [7]. Statistical approach uses hidden Markov model (HMM) [8] to identify the word and sentence boundaries [9]. This method uses scanning and HMM boundary detector modules for tokenization. In order to identify the meaning of the word part of speech (POS), taggings such as the Penn Treebank Tagset (PTT) [7], CLAWS 5 (C5) Tagset [10] can be used. An important task in information extraction is name entity recognition (NER), which identifies names of entities such as group, persons, places, currency, ages and times [11].
3 Resume Parsers Natural language processing and machine learning have the capability to understand and parse the unstructured written language, and extract the desired information. Existing resume parsers use semantic search to understand the context of the language in order to find the reliable and comprehensive results. A resume parser converts the unstructured form of data into a structured form. Resume parser automatically separates the information into various fields based on given criteria. It separates the information into various fields based on the given criteria and parameters such as name, address, email, start/end date, job title, company, location, degree, university, year of passing, location, dancing, singing and swimming [2]. Several open-source and commercial resume parsers are available for information extraction.
3.1 Open-Source Resume Parser Open-source resume parsers are distributed with source code and these sources are available for modification. These open-source libraries parse free-form of resumes, that is, pdf, doc, docx, RTF, HTML into structured data such as XML or JSON. Meanwhile, the social media profile links these parsers and parse the public webpages and convert these data into structured JSON format, such as LinkedIn and Github. Table 1 shows the list of open-source resume parsers and its properties. These open-source parsers are simple and easy to use except Deepak’s parser and follows the same approach for cleaning and parsing. These parsers still contain
Focuses on Extracts information from resume
Extracts information from resume Extracts information from resume uses a hybrid machine-learning and rule-based approach focuses on semantic rather than syntactic parsing Extracts information from English resume use Keras and deep learning models
Resume Parser
Brendan Herger’s (Herger, 2015/2020)
Skcript Technologies’ (Skcript/Cvscan, 2016/2020)
Antony Deepak’s (GitHub—Antonydeepak/resume parser: resume parser using rule based approach. developed using framework provided by GATE, n.d.)
Keras-English-resume-parser-and-analyzer (Chen 2018/2020)
Table 1 Open-source resume parsers
Python
Java
Python
Python
Programming language
PDF miner
Apache’s Tika library
PDFMiner
PDFMiner
Library used
Raw content
Structured.json format
.json format
CSV file
Output file
Simple and better accuracy
Better accuracy
Simple and CLI interface
Simple and Language dependent
Advantage
Language dependent
Complex and difficult to use
Fails to extract date most of the time
Information loss in terms of date and job description
Disadvantage
210 A. K. Sinha et al.
Resume Screening Using Natural Language Processing …
211
Table 2 Commercial resume parsers Resume parser
Focuses on
Output file
Advantage
HireAbility’s ALEX [13]
Extracts information from resume
HR-XML, JSON
Supports multiple languages and locales Accurate, fast and secure
RChilli’s [14]
Extracts information from resume
XML, JSON
Self-learning capability Fast and reliable
DaXtra [15]
Extracts information from resume
XML, JSON
Multilingual resume parsing Most comprehensive and accurate
Rapidparser [16]
Extracts information from resume
XML, JSON
Multilingual resume parsing simple and accurate
HTML and Unicode characters with negative effect on named entity recognition [12].
3.2 Commercial Resume Parsers Commercial resume parsers are designed and developed for sale to end users [12]. These resume parsers have more classy algorithms for attribute recognition than open-source parsers and allows them to correctly identify these attributes. The strength of commercial parsers undoubtedly lies in the careful analysis of resume to identify different sections such as skill, qualification and experience sections. Table 2 shows the list of commercial resume parsers and its properties. Many parsers are available in the market that provides CV automation solutions and round-the-clock customer support. These resume parsers APIs are inexpensive and easy to integrate.
4 Semantic Searches for Context-Based Searching Semantic search means searching text with meaning to improve correctness of search by understanding the searcher intent [17]. In comparison to lexical search, the program looks for exact matches without understanding the meaning. Semantic search uses various parameters such as context, place, purpose and synonyms, to find appropriate search results. The benefits of semantic search with reference to resume screening include the ability to perform fuzzy matching, allow pattern recognition, fetch experience by context, capable to establish relation between words and
212
A. K. Sinha et al.
ideas. Context-based search includes various parts of search process such as understanding the query and knowledge. Literature shows the use of semantic search for context-based searching and are very effectives in parsing resume [18]. A semantic binary signature has been proposed in the literature [19]. It processes a search query by determining relevant categories and generates a binary hashing signature. The appropriate categories are examined and hamming distances are calculated between inventory binary hashing signatures and search query. The hamming distance shows semantic significance that can be used to understand the searcher intent. A novel sentence-level emotion detection using semantic rules has been published in the literature [20]. This paper discusses an efficient emotion detection method and matches emotional words from its emotional_keyword database. This technique investigates the emotional words and provides better result and performance than existing researches. An NLP-based keyword analysis method has been proposed in the literature [21]. This method uses three matrices document content matrix V, word feature matrix W and document feature matrix H. Then, rank is calculated for each word using the set of coefficients. Finally, rank is generated for one or more queries using the ranks for each word. Thomas and Sangeetha proposed [22] an intelligent sense-enabled lexical search on text documents to extract word from text document. This method uses word sense disambiguation (WSD) of each word and then semantic search on the input text to extract semantically related words. This method of extraction is useful in resume screening, resume learning and document indexing. Alexandra et al. proposed [23] design and implementation of a semantic-based system for automating the staffing process. The proposed system uses skills and competencies lexicon for semantic processing of the resumes and matches the candidate skills as per the job necessities. This method eliminates reputative activities to minimize processing time of recruiter and improves search efficiency using complex semantic criteria. Kumar et al. [24–27] proposed an object detection method for blind people to locate objects from a scene. They used machine-learning-based methods along with single SSMD detector algorithm to develop the model. This research shows the uses of semantic search in order to understand the context of the language for reliable and comprehensive results.
5 Research Challenges and Future Scope The correctness of resume parser depends on a number of factors [1], such as writing style, choice of words and syntax of written text. A set of statistical algorithms and complex rules are needed to suitably know and fetch the correct information from resumes. Natural language processing and machine learning have the capability to
Resume Screening Using Natural Language Processing …
213
understand and parse the unstructured written language and context-based information. There are many ways to write the same information such as name, address and date. So, resume parsing is still in its natal stage, and few important challenges and future scope are as follows [1, 12]: • • • • • • • • •
Understanding writing style of resume Understanding choice of words in a resume Understanding syntax of unstructured written language Context-based searching Understanding organization and formatting of resume Understanding headers and footers of resume Breaking resume into sections Understanding the structural and visual information from PDFs Speed of parsing.
6 Conclusions Resume screening is the process that extract information from unstructured documents using complex patterns matching/language analysis techniques. Natural language processing and machine learning have the capability to understand and parse the unstructured written language and context-based information. This paper presents a systematic review on resume screening and enlightens the comparison of recognized works and investigate open-source and commercial resume parser. This paper discusses several open-source and commercial resume parsers for information extraction. Then, a review on the use of semantic search for context-based searching has been explained. In addition, this paper also shows the research challenges and future scope of resume parsing in terms of writing style, word choice and syntax of unstructured written language.
References 1. Résumé parsing, https://en.wikipedia.org/w/index.php?title=R%C3%A9sum%C3%A9_par sing&oldid=921328084 (2019) 2. Seiv, M., HR software companies? Why structuring your data is crucial for your business?, https://medium.riminder.net/hr-software-companies-why-structuring-your-data-iscrucial-for-your-business-f749ecf3255a. Accessed on 25 Jan 2020 3. Dale, R., Moisl, H., Somers, H., Handbook of Natural Language Processing (CRC Press, 2000) 4. Reynar, J.C., Ratnaparkhi, A., A maximum entropy approach to identifying sentence boundaries, in Proceedings of the Fifth Conference on Applied Natural Language Processing (Association for Computational Linguistics, 1997), pp. 16–19 5. Riley, M.D., Some applications of tree-based modelling to speech and language, in Proceedings of the workshop on Speech and Natural Language (Association for Computational Linguistics, 1989), pp. 339–352
214
A. K. Sinha et al.
6. T. Kiss, J. Strunk, Unsupervised multilingual sentence boundary detection. Comput. Linguist. 32, 485–525 (2006) 7. The Stanford Natural Language Processing Group, https://nlp.stanford.edu/software/tokenizer. shtml. Accessed on 05 Mar 2020 8. Manning, C.D., Manning, C.D., Schütze, H., Foundations of Statistical Natural Language Processing (MIT Press, 1999) 9. B. Jurish, K.-M. Würzner, Word and sentence tokenization with hidden Markov models. JLCL 28, 61–83 (2013) 10. UCREL CLAWS5 Tagset, https://ucrel.lancs.ac.uk/claws5tags.html. Accessed on 05 Mar 2020 11. L. Derczynski, D. Maynard, G. Rizzo, M. Van Erp, G. Gorrell, R. Troncy, J. Petrak, K. Bontcheva, Analysis of named entity recognition and linking for tweets. Inf. Process. Manag. 51, 32–49 (2015) 12. Neumer, T., Efficient Natural Language Processing for Automated Recruiting on the Example of a Software Engineering Talent-Pool 88 (2018) 13. Resume parsing software | CV parsing software—HireAbility, https://www.hireability.com/. Accessed on 09 Mar 2020 14. Inc, Rc., Looking for a perfect job/resume parser alternative, https://www.rchilli.com/lookingfor-a-perfect-job/resume-parser-alternative. Accessed on 09 Mar 2020 15. Resume Parsing Software | CV Parsing Software, https://www.daxtra.com/resume-databasesoftware/resume-parsing-software/. Accessed on 09 Mar 2020 16. CV Parsing Lightning-fast - RapidParser, https://www.rapidparser.com/. Accessed on 09 Mar 2020 17. Semantic search, https://en.wikipedia.org/w/index.php?title=Semantic_search&oldid=940 652635 (2020) 18. Bast, H., Buchhold, B., Haussmann, E., Semantic search on text and knowledge bases. Found. Trends® Inf. Retr. 10, 119–271 (2016). https://doi.org/10.1561/1500000032 19. Liu, M.: Search system for providing search results using query understanding and semantic binary signatures, https://patents.google.com/patent/US20200089808A1/en (2020) 20. Seal, D., Roy, U.K., Basak, R., Sentence-level emotion detection from text based on semantic rules. In: Tuba, M., Akashe, S., Joshi, A. (eds.) Information and Communication Technology for Sustainable Development (Springer, Singapore, 2020), pp. 423–430. https://doi.org/10.1007/ 978-981-13-7166-0_42 21. Baughman, A.K., Diamanti, G.F., Marzorati, M., Natural language processing keyword analysis, https://patents.google.com/patent/US10614109B2/en (2020) 22. Thomas, A., Sangeetha, S., Intelligent Sense-enabled lexical search on text documents. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Intelligent Systems and Applications (Springer International Publishing, Cham, 2020), pp. 405–415. https://doi.org/10.1007/978-3-030-29513-4_29 23. Alexandra, C., Valentin, S., Bogdan, M., Magdalena, A.: Leveraging lexicon-based semantic analysis to automate the recruitment process. In: Ao, S.-I., Gelman, L., Kim, H.K. (eds.) Transactions on Engineering Technologies (Springer, Singapore, 2019), pp. 189–201. https:// doi.org/10.1007/978-981-13-0746-1_15 24. Kumar, A., A review on implementation of digital image watermarking techniques using LSB and DWT, in Information and Communication Technology for Sustainable Development (Springer, 2020), pp. 595–602 25. Kumar, A., Reddy, S.S.S., Kulkarni, V., An object detection technique for blind people in realtime using deep neural network, in 2019 Fifth International Conference on Image Information Processing (ICIIP) (IEEE, 2019), pp. 292–297 26. A. Kumar, Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. Int. J. Cloud Appl. Comput. IJCAC 9, 22–36 (2019). https://doi.org/10. 4018/IJCAC.2019070102 27. A. Kumar, S. Srivastava, Object detection system based on convolution neural networks using single shot multi-box detector. Proc. Comput. Sci. 171, 2610–2617 (2020)
Assessment of Osteogenic Sarcoma with Histology Images Using Deep Learning Himani Bansal, Bhartendu Dubey, Parikha Goyanka, and Shreyansh Varshney
Abstract As the life expectancy rate is decreasing globally, it diverts the concern of the human race toward health issues. Cancer is a common disease; the possibility of it being osteogenic sarcoma (osteosarcoma) is rare. Even after advancements in medical science and technology, it is hard to detect it at an early stage. It is the necessity of technology to develop a supportive working system with precise accuracy to identify this type of cancer and serve humanity. The authors thus present the development and execution of a computer-aided diagnosis system based on image processing, machine learning, and deep learning techniques. The dataset used comprises hematoxylin and eosin (H&E) stained histology images obtained through biopsy at different stages of cancer. Features are extracted after performing image segmentation. A convolutional neural network (CNN) is designed/customized for classifying cancer among patients into four categories after successful implementation, viz., viable tumor, non-viable tumor, non-tumor, and viable/non-viable tumor with higher accuracy. Using features proposed by scientists, the accuracy achieved by authors is 91.20%. After further improvements in CNN architecture by authors, the accuracy rate achieved is 93.39%. Keywords Cancer · Convolutional neural network · Deep learning · Image processing · Machine learning · Osteogenic sarcoma · Osteosarcoma
H. Bansal · B. Dubey (B) · P. Goyanka · S. Varshney Department of CSE/IT, Jaypee Institute of Information Technology, Noida, India e-mail: [email protected] H. Bansal e-mail: [email protected] P. Goyanka e-mail: [email protected] S. Varshney e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_23
215
216
H. Bansal et al.
1 Introduction Cancer is defined as growth of abnormal cells in a particular region which can probably affect other parts of the body as well. There are approximately 18 million new cases every year globally [1]. Cancer is considered to be the second most leading cause of death worldwide [2]. Among various forms of cancer, there exist few which are hard to detect and thus lead to painful deaths. Some cancers grow and spread rapidly, whereas others are slow. Osteosarcoma is also among them, which is one of the rarest forms of cancer, but is the most common in case if the patient is having a bone cancer. An osteosarcoma is a cancerous tumor which occurs mostly in the long bones (like arms and legs). Children and young adults are found to be more vulnerable to it. The symptoms can be very lukewarm, such as pain in bones and muscles, swelling, and so on. The treatment includes surgery, chemotherapy, and radiation [3]. The four categories for classification of osteosarcoma cancer are mainly: viable tumor (the region has a high capability for the existence and development of cancerous cells), non-viable tumor (the region is not at all capable enough of growing or developing of cancerous cells), non-tumor (the region without the growth of cancerous cells), viable/non-viable tumor (the region has the certain possibility for the presence of cancerous cells). The objective is to build an efficient system for the detection of osteosarcoma to recognize the level of risk associated with the patients in order to provide them the necessary treatments accordingly. The system can be implemented in a standalone application, which is really reliable as well as robust. This method would also be a beneficial one in the rural parts of the map, where advanced instruments are not available with ease.
2 Literature Survey Huang et al. [4] in 2016 presented osteosarcoma detection using magnetic resonance images. Their research work is on texture context features. Based on CRF using multitarget osteosarcoma MRI recognition, it was inadequate to identify the osteosarcoma lesions and enclosing issues concurrently. Li et al. [5] presented the classification of osteosarcoma in the early stages where they subject the metabolomic data on osteosarcoma patients to three classification systems. The executions were assessed and analyzed using target working characteristic arches and they documented efficiency measure for logistic regression, support vector machine, and random forest as 88, 90, and 97%, respectively. Okamoto et al. [6] in 2018 presented a system that has been developed using CNN and SVM tailored on DSP Core for colorectal endoscopic images. Alge et al. [7] classified osteosarcoma doing a meta-analysis of gene expression. It allows comparing gene expression over several stages. They performed a meta-analysis of RNA-sequence data to develop a random forest analysis model. The random forest
Assessment of Osteogenic Sarcoma …
217
model had an average precision of 74.1 and 80.0% for training set and testing set, respectively. Arunachalam et al. [8] in 2019 presented the evaluation of viable and necrotic tumor by whole slide images of osteogenic sarcoma applying machine learning and deep learning models. They picked 40 scanned whole slide images describing the diversity of osteogenic sarcoma and chemotherapeutics acknowledgment. Intending to label the various areas of the scanned tissue in a viable tumor, necrotic tumor, and non-tumor, they considered 13 machine-learning models and decided on the one whose performance was best. SVM is the most suitable one based on stated efficiency. Sahiner et al. [9] presented a deep learning model in medical imaging and radiation therapy and summarized the major achievements, general and novel tests, and the plans used to discuss such tests.
3 Proposed System Workflow The authors worked through all the steps that led to systematic research. This includes data development, accompanied by training and classification for machine learning and deep learning models. The workflow is depicted in Fig. 1.
Fig. 1 Workflow diagram
218
H. Bansal et al.
3.1 Data Collection, Preparation, and Pre-processing The dataset has been archived from the University of Texas, South-western Medical Centre, Dallas [10]. The detailed process of the formation of the dataset starts with the examining of 50 patients who were under observation during their treatment period (from 1995 to 2015) at Children’s Medical Centre, Dallas. Figure 2 depicts the process of data preparation. The recorded samples of 50 cases were assembled, occurring in 942 whole slide images (WSIs). Then 40 WSIs were handpicked by a pathologist. Image tiles coming in non-tissue, ink-mark regions and blurry images were removed to get noise-free data. Afterwards, 1144 image tiles of size 1024 × 1024 were produced from WSIs. Finally, from every image tile in the previous step, 56,929 image pieces of size 128 × 128 were generated and stored in a CSV file. As depicted in Fig. 3, this CSV file is served to the support vector machine and random forest methods to obtain the basic results before designing our own CNN. Not a number (NaN) values were removed and other fields with missing values were deleted. Images were segregated as per their labels of classification (for neural net-based classification).
50 patient cases 942 WSIs
Select 40 WSIs
Fig. 2 Data preparation
Fig. 3 Basic idea of implementation
Generate 56,929 image patches Size: 128X128
Generate 1144 tiles at random Size: 1024X1024
Assessment of Osteogenic Sarcoma …
219
3.2 Support Vector Machine (SVM) Supervised machine learning is a classification model which aims to maximize the bound within the classification extremities with various classes. SVM finds the hyperplane within the points by decreasing the function, for the set of input (p: extracted features), having corresponding weight (W ), and the class label (Q). The function is computed through the subsequent Eq. (1): 1/2||W 2 || + C
n i=1
εi
(1)
such that: yi (wT · pi + b) − 1 + εi ≥ 0 (∀i ∈ n). Here, n is the number of data samples, C represents the parameter to control over-fitting issue, and εi is a slack variable.
3.3 Random Forest (RF) Supervised machine learning algorithm is centered on the ensemble learning method. In RF, the algorithm of the corresponding type combines multiple decision trees emerging into the forest of trees. Since random forest build many separate decision trees at the time of training and provide the final classes of classification on the basis of all the trees, we have to decide how to arrange nodes on the decision trees. To simplify this task, Gini index (Eq. 2) is defined. GI = 1 −
c
(pi2 )
(2)
i=1
Here, pi is relative frequency of class and c is the number of classes.
4 Implementation After analyzing various machine learning algorithms, it was found that implementing random forest (RF) and support vector machine (SVM) will be more effective for the detection of osteosarcoma cancer as these algorithms usually give effective results for categorization especially in the biomedical field. However, the accuracy measurements of these algorithms are not adequate for the satisfactory results during early prediction of cancerous cells. So, authors decided to move toward deep learning approach.
220
H. Bansal et al.
4.1 Deep Learning In recent years, deep learning using the neural network technique has proved to be very successful in the field of image classification. It wasn’t long before that deep learning was used in solving the medical imaging issues. It is a secure approach for classifying medical images.
4.2 Proposed CNN The input layer accepts RGB images of pixel values [128 × 128]. The dot products for the input values that are attached to the local neurons is achieved by the convolution layer. A max-pooling layer is followed by the convolution layer and it down-samples the volume. The convolution layer has a filter size of (3 × 3) and the max-pooling layer has the down-sampling size of (2 × 2) based on the number of experiments (the architecture with three times alternations came out to be the best). Further, there is a flattening layer which converts the matrix form output obtained from the last step into a vector form. Then, there are two fully connected layers. The ReLu function and Sigmoid function that are used as the activation functions for both the layers, respectively (concluded by authors after experiments). Finally, the fully connected layer has four output nodes (neurons) giving the probability value for each class of tumor. This network is trained by selecting “adam” (adagrad + remsprop) as optimizer, as it is capable of handling sparse gradients and noisy problems by adapting the required specific learning rate based on training. The class mode considered is “categorical” as output is classified into four different categories. The model runs for total 10 epoch cycles, with 300 steps per epoch, batch size as 20, and target size of 128 × 128 (concluded by authors after experiments). The designed structure for the CNN is depicted in Fig. 4 and consists of three “convolution layers”, three “pooling layers”, and two “fully connected layers”. Three layers considered as the input image need to be classified into four classes as discussed in earlier sections, and according to a general rule: a “n-class” classifier needs “n − 1” layers.
Fig. 4 Designed architecture of CNN
Assessment of Osteogenic Sarcoma …
221
4.3 Algorithm Step 1. Import required libraries. Step 2. Initializing the convolutional neural network (CNN) as three pairs of • • • •
First convolutional layer. Then pooling layer duo. Then a flattening layer. Finally, two fully connected layers.
Step 3. Compile the convolutional neural network over the training set. • Fitting the CNN on the images. Step 4. Load the drive helper (as the dataset is present in the drive). • Mount it (this will prompt for authorization). • Now testing ends here. Step 5. Compile the convolutional neural network over the testing set. • Fitting the CNN to the images. Step 6. End (wait for the results).
5 Results The growing risk of the disastrous disease like cancer can be detected and diagnosed effectively by the use of science and technology with ease. The use of support vector machine and random forest methods so far concluded that the patient can be diagnosed properly. Once the type of cancer is determined, the treatment procedure could be started and hence someone’s life could be saved from trenching after-effects of such deadly disease.
5.1 Evaluation The objective of authors for using this CNN network is to classify the input image into following classes: viable tumor, non-viable tumor, non-tumor, and viable/nonviable tumor in minimal time with great accuracy in order to provide the required treatment to a patient to save life. The output occurred from this network is the probability distribution, with the highest probability for each class and the accuracy of this network is calculated by using general method as: (Count of True Positives + Count of True Negatives)/(Total Sample Size).
222 Table 1 Summary of CNN
Table 2 Analysis of CNN
H. Bansal et al. Parameter
Selected
Activation function
Relu and sigmoid
Optimizer
Adam
Number of epochs
10
Steps per epoch
300
Batch size
20
Parameter
Selected
Loss
0.1472
Accuracy
0.9339
Validation loss
0.0553
Validation accuracy
0.9120
Table 3 Results achieved Algorithm applied
Accuracy achieved (%)
Support vector machine
82
Random forest
85
Convolutional neural network (CNN) with customized layers
93.39
This saturated and efficient accuracy is generated after running the network model for a total of 10 epoch cycles, with 300 steps per epoch. Tables 1 and 2 give the summary and analysis of customized CNN, respectively. Besides this network, the authors have also implemented some machine learning algorithms by combining scientists’ identified features, and the best performing models are identified as support vector machine (SVM) with the kernel as radial basis function, whereas in random forest (RF) with number of trees 200, the random state is 0. The design of this suggested CNN model is on the basis of available datasets and resources. As a result, a comparative study by the authors has suggested that CNN works best over all the other models for the available dataset. The result obtained (by authors) for all the three models are manifested in Table 3.
6 Conclusion Osteosarcoma is an extremely complex tumor. In this paper, the authors presented the computerized, systematic classification of images based on four main classes, namely viable tumor, non-viable tumor, non-tumor, and viable/nonviable. The microscopic features scanned from the input image tiles are based on nuclei characteristics, spatial
Assessment of Osteogenic Sarcoma …
223
characteristics, distance-based characteristics, and textural characteristics. Conventional machine learning models were used by linking scientists recognized traits. We picked random forest and support vector machine with Gaussian kernel. In resemblance, CNN is developed which consists of multiple layers as described in earlier sections. Proceeding with our test dataset, efficiency is achieved by two conventional methods selected, SVM and RF classification models, and the provided deep learning models perform particularly effective and efficient in the classification task. The work impersonated in this paper sets the core for image analysis in osteosarcoma. Further, by the use of deep learning approach, designing a CNN could more efficiently provide much accurate results, which could help in early detection of osteosarcoma cancer with an accuracy rate of 93.39% and thus could help in saving lives of patients by providing proper diagnosis as well as proper treatment on time. The likewise structure can additionally be readjusted for different sorts of tumors.
References 1. The International Agency for Research on Cancer (IARC), Press Release. https://www.who. int/cancer/PRGlobocanFinal.pdf. Accessed on 12 Sept 2019 2. Cancer Key facts, World Health Organization. https://www.who.int/news-room/fact-sheets/det ail/cancer. Accessed on 16 Feb 16 2020 3. M. Uhl, U. Saueressig, M. van Buiren, U. Kontny, C. Niemeyer, G. Köhler, M. Langer, Osteosarcoma: preliminary results of in vivo assessment of tumor necrosis after chemotherapy with diffusion-and perfusion-weighted magnetic resonance imaging. Invest. Radiol. 41(8), 618–623 (2006) 4. W.B. Huang, D. Wen, Y. Yan, M. Yuan, K. Wang, Multi-target osteosarcoma MRI recognition with texture context features based on CRF. In: 2016 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2016), pp. 3978–3983 5. Z. Li, S.R. Soroushmehr, Y. Hua, M. Mao, Y. Qiu, K. Najarian, Classifying osteosarcoma patients using machine learning approaches. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (IEEE, 2017), pp. 82–85 6. T. Okamoto, T. Koide, S. Yoshida, H. Mieno, H. Toishi, T. Sugawara, B. Raytchev, Implementation of computer-aided diagnosis system on customizable DSP core for colorectal endoscopic images with CNN features and SVM. In: TENCON 2018–2018 IEEE Region 10 Conference (IEEE, 2018), pp. 1663–1666 7. O. Alge, J. Gryak, Y. Hua, K. Najaria, Classifying osteosarcoma using meta-analysis of gene expression. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (IEEE, 2018), pp. 2400–2404 8. H.B. Arunachalam, R. Mishra, O. Daescu, K. Cederberg, D. Rakheja, A. Sengupta, P. Leavey, Viable and necrotic tumor assessment from whole slide images of osteosarcoma using machinelearning and deep-learning models. PloS One 14(4), e0210706 (2019) 9. B. Sahiner, A. Pezeshk, L.M. Hadjiiski, X. Wang, K. Drukker, K.H. Cha, M. Giger, L Deep learning in medical imaging and radiation therapy. Med. Phys. 46(1), e1–e36 (2019) 10. The Cancer Imaging Archive (TCIA) Public Access. Osteosarcoma data from UT Southwestern/UT-Dallas for Viable and Necrotic Tumor Assessment. https://wiki.cancer imagingarchive.net/pages/viewpage.action?pageId=52756935#bcab02c187174a288dbcbf95 d26179e8. Accessed on 15 July 2019
SMDSB: Efficient Off-Chain Storage Model for Data Sharing in Blockchain Environment Randhir Kumar, Ningrinla Marchang, and Rakesh Tripathi
Abstract Blockchain technology has been gaining great attention in recent years. Owing to its feature of immutability, the data volume of the blockchain network is continuously growing in size. As of now, the size of Bitcoin distributed ledger has reached about 200 GB. Consequently, the increasing size of the blockchain network ledger prevents many peers from joining the network. Hence, the growing size not only limits the expansion of network but also the development of the blockchain ledger. This calls for development of efficient mechanisms for storage and bandwidth synchronization. As a step toward this goal, we propose an IPFS-based decentralized off-chain storage called storage model for data sharing in blockchain (SMDSB), to store the data volume of the blockchain network. In the proposed model, the miners validate and deposit the transactions into an IPFS-based decentralized storage, whereas they store the hash of the transactions in a blockchain network. Thus, by utilizing the characteristics of IPFS-based off-chain storage and its feature of hash creation, the blockchain ledger size can be significantly reduced. Implementation of SMDSB results in reduction of Bitcoin storage by 81.54%, Ethereum by 53.86%, and Hyperledger by 62.59%. Additionally, the experimental results also highlight the impact on storage cost for 1 transaction per second (TPS) in a year. Keywords Blockchain · Off-chain storage · Interplanetary file system (IPFS) · IPFS hash · Peer-to-peer data sharing model
R. Kumar (B) · R. Tripathi Department of Information Technology, National Institute of Technology Raipur, Raipur, Chhattisgarh 492010, India e-mail: [email protected] R. Tripathi e-mail: [email protected] R. Kumar · N. Marchang · R. Tripathi Department of Computer Science, North Eastern Regional Institute of Science and Technology (NERIST), Nirjuli, Arunachal Pradesh 791109, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_24
225
226
R. Kumar et al.
1 Introduction The blockchain technology is known to be decentralized, immutable, and cryptographically secure with high byzantine fault tolerance. It was first implemented in the Bitcoin cryptocurrency [1]. Ever since the blockchain technology operation started in the year 2009, the data volume of the Bitcoin ledger has been growing in size every year leading to storage problems [3]. As of now, the ledger size of the Bitcoin network has grown up to 200 GB, and this size is said to be increasing daily by 0.1 GB. Hence, data storage [4, 7, 19] and data synchronization [5] have emerged as challenging issues. With regard to Bitcoin, the transaction blocks only consist of the transfer records which contain the addresses of sender and receiver and consequently are relatively very small data records. However, in future, application of the blockchain technology will not only be limited to Bitcoin. For example, the IoTbased blockchain system [6, 18, 22] may consist of off-chain data like audio, video, and text files. Therefore, in the foreseeable future, a high demand for off-chain data in blockchain network is perceived. Consequently, an efficient off-chain data storage model is required to deal with large volumes of data. In the decentralized structure of blockchain technology, every node keeps a copy of the ledger. The copying of a ledger among all the peers in the network ensures security and availability of data. However, repeated storage of the ledger by every node increases data redundancy in the network. Therefore, a suitable storage model is required to reduce data redundancy in the network. Off-chain data are non-transactional data that are too large to be stored in the blockchain efficiently, or, that require the ability to be changed into a different format in order to reduce the size of the original data. The objective of off-chain data storage is to tackle the limitation of growing size of blockchain networks by storing the off-chain data elsewhere off the blockchain. However, in doing so, its fundamental properties like availability, privacy, integrity, and immutability network may be compromised. Various models have been already proposed [3, 7, 8, 10] for off-chain storage and reduction of data size. The downside is that use of these approaches may result in the original state of data being potentially violated. Thus, the need of the hour is an efficient off-chain data storage model which can maintain the state of data and their integrity. With the help of distributed storage systems like interplanetary file system (IPFS), large volumes of data can be stored off-chain and the hash of the off-chain data within the blockchain network, thereby maintaining immutability. The files in IPFS are stored in the distributed hash table (DHT) with their hash mapping which ensures data integrity. The stored files in IPFS are content-addressed and temper-proof due to the version control feature [2]. IPFS removes redundant off-chain data and maintains version control for non-transactional data. The edit history is easily recorded and traced out in the IPFS distributed peer-to-peers storage system. Off-chain data are retrieved from IPFS with the help of their hash values. Moreover, IPFS provides availability of data and maintains synchronization among the peers in the network [24].
SMDSB: Efficient Off-Chain Storage Model …
227
The rest of the paper is organized as follows: Sect. 2 provides an overview of the related work. In Sect. 3, we present the proposed model for off-chain data storage in order to reduce the size of blockchain network. In Sect. 4, we present the methodology used in SMDSB model. Section 5 shows the implemention of off-chain data storage. Section 6 gives the result analysis, and Sect. 7 concludes the paper.
2 Related Work This section discusses related existing work. In [3], a coding-based distributed framework is introduced to reduce the ledger size, where each block is divided into subblocks. A coding scheme is then used to encode these sub-blocks into more subblocks, which are then disseminated to all the peers in network. The encoding and decoding process increases the complexity of the blockchain network. Moreover, a large number of blocks get distributed over the network. Although the coding (and decoding) scheme helps in reducing the size of data to some extent, it leads to data inconsistency problems and decreases the system efficiency. In [7], a technique that records the transactions by using block change approach is proposed. In this approach, a set of transactions which is older moves into summary block, and the current transactions are stored in the network chain. The summary block approach reduces the size of the main chain of the network up to some extent. However, at the same time, size of the summary block increases owing to the collection of older transactions of different blocks. Moreover, storage space for the summary block is again needed. In another similar work [10], a summary block approach is used to store the details of blockchain ledger is presented. It utilizes the file system to store the state of the summary block. This approach suffers from the issue of historical traceability. Besides, the entire system depends on the traditional file systems because of which if the file gets corrupted, then the complete summary block state may vanish. The account tree approach is employed to store the account details of non-balance accounts in [8]. In the proposed work, the expired transaction blocks are deleted, and only the block header details such as merkle root, proof-of-work, and previous hash of the block are kept [5]. This approach of holding current transactions helps to reduce the size of the blockchain ledger to the maximum extent. On the downside, historical records of transactions cannot be traced out. The above techniques have various limitations. Keeping this in mind, in this paper, we present an off-chain storage model for the blockchain network with an aim to overcome the problem of storage. The proposed IPFS-based off-chain storage scheme maintains the IPFS hash of the transactions which is unique for each transaction. The hash of a transaction consists of only 46 bits irrespective of the original size of the transaction. Furthermore, the IPFS model maintains the versioning control system. In our proposed storage model, the same IPFS hash gets disseminated among all the peers of network. This ensures consistency in the network. Additionally, our technique eliminates the drawback of third party dependency for storage. Moreover,
228
R. Kumar et al.
synchronization among the peers is maintained due to the reduced size of transactions. Consequently, the problem of adding new peers is possibly resolved. Furthermore, our proposed model supports all types of transaction data, maintains the traceability of the blockchain history, and ensures the availability of transactions among the peers. Motivation The work in this paper is motivated by limitations in existing works where size of distributed ledger is reduced by keeping the old transactions record in the chain of summary block. These approaches which use summary block storage suffer when summary block data volume increases along with time. Another limitation of the approach is that it fails to maintain the history of transactions. In this paper, we propose a storage model which can store structured as well as unstructured files by using off-chain storage. Novelty Whereas existing literature mostly addressed the problem on how to reduce the size of blockchain storage, to the best of our knowledge, there is no work as yet on storage and access of unstructured files such as CSV, Excel, PPT and Word. Hence, to the best of our knowledge, we are the first to provide a storage model for unstructured files by using blockchain and IPFS distributed file storage system. The proposed model will be very much useful in real-time applications including supply chain management, health care, finance, identity management, insurance claiming, and other applications in which government agencies need to store reports, audio, video, and other transactions in a blockchain network. The contribution of the paper is as follows: 1. We have proposed a storage model for data sharing in blockchain (SMDSB) to reduce the size of blockchain network using IPFS distributed file storage system. 2. We have implemented proof-of-work consensus approach to validate the transactions with their IPFS hash values. The mining process is applied for block creation in the blockchain network. 3. The ledger which is relatively small in size is disseminated among all the peers of blockchain network which ensures synchronization among the peers. 4. We have given transaction size representation in bytes of Bitcoin, Ethereum and Hyperledger using SMDSB and found that the model is efficient in reducing storage size and in providing synchronization and availability of transactions among the peers in the network.
3 Storage Model for Data Sharing in Blockchain (SMDSB) This section discusses SMDSB, the proposed storage model for data sharing in blockchain. The model can efficiently store both structured and unstructured files such as pdf, text, audio, video, csv, and excel sheets. Figure 1 illustrates how the model works. It consists of four different components, viz. user, miner, off-chain storage and blockchain storage.
SMDSB: Efficient Off-Chain Storage Model … Prev (H) Block 1 IPFS Hash
List of Transactions
229 Prev (H)
Prev (H)
Prev (H) Block 2 IPFS Hash
Block 3 IPFS Hash
Returned IPFS Hash
Peer-1
Block 4 IPFS Hash
List of Transactions
Peer-2
Process of Mining
Miners
Fig. 1 SMDSB model for data sharing using blockchain and IPFS-based decentralized storage
User A peer in the blockchain network is also called a user. Different peers can share their information on the network by using the upload process. The users can share more than one file at a time in the network. The other peers (users) in the network can verify the transactions by using a consensus technique. The user must be registered to become a part of the blockchain network. Miner A miner validates the list of transactions and disseminates it among the peers in the network to verify the transactions in order to construct a new block in the blockchain network. In SMDSB, the miner stores the original transaction in the IPFS distributed storage during the mining process. At the same time, it generates the IPFS hash of the transaction and stores it in the blockchain network. The IPFS hash thus generated is unique for each transaction and gets modified only when the original content (corresponding transaction) is modified due to the two inherent IPFS features, viz. version control system and distributed hash table (DHT) storage. Off-chain storage IPFS is used for off-chain storage in SMDSB. IPFS is distributed in nature. Moreover, it provides facilities to generate hash of a file which is much smaller in size than that of the file. The IPFS hash consists of only 46 bytes, thereby needing little storage in the blockchain network. SMDSB supports storage of both structured and unstructured files. Examples of structured files include pdf, audio, video, text, and image files, whereas those of unstructured files include excel, csv, and word files. Therefore, the SMDSB model can be used in real-time applications to share large volumes of information among peers.
230
R. Kumar et al.
Blockchain storage As mentioned earlier, only the IPFS hash of a transaction is stored in the blockchain network in order to reduce the size of blockchain, while the transaction itself is stored in IPFS. Moreover, to maintain the history of transactions, we keep the timestamp of each transaction, and similar copies of the chain get disseminated among the peers to provide the transaction details. Furthermore, the proof-of-work consensus is applied to ensure consistency in the network. To identify the files in the blockchain network, we also include the file extension of a file. The following steps are involved in the working of the SMDSB model: 1. The user can upload both types of files, i.e., structured and unstructured as a transaction in blockchain network. Moreover, a list of files can also be uploaded at a time by a peer in blockchain network as a transaction. 2. The list of transactions gets disseminated to the local miners for validation. Once the transactions get validated by the miners, then they are verified by the peers in network with the purpose of constructing a new block in the blockchain network. 3. The transaction hash is generated during the mining process, and the IPFS hash gets stored into the blockchain network. 4. The synchronization of peers is easily maintained owing to the IPFS hash which is unique for each transaction, and the same IPFS hash gets accessed by other peers in the network.
4 Methodology In this section, we present the methodology used in the SMDSB model. We first present the existing storage architecture and then the proposed storage architecture. In the existing blockchain architecture, which we call On-chain Storage, files and their hashes are stored on the blockchain network as transactions. The problem with this method is that as more and more files get uploaded, the storage size of blockchain increases rapidly since all the content is stored on the blockchain network. Whenever a new peer arrives, it has to download the entire chain containing all the files. This solution is definitely not scalable, especially since file sizes may run into many Megabytes. The proposed storage architecture which overcomes the above drawbacks uses the IPFS. IPFS is a distributed P2P hypermedia distribution protocol. IPFS [13] uses content-addressable approach and generates cryptographic hashes to give unique fingerprint to files, removes redundancy across the network. Moreover, these unique hashes are used for information retrieval. In the proposed model, the file hashes are stored in the blockchain network and the actual files in IPFS. Thus, we make large savings in size of the blockchain network while maintaining privacy of the transactions (files). The content-addressable hashes are linked into the blockchain network to reduce the overhead of blockchain scalability. Each hash is mapped using map_user function in our implementation in order to provide auditability of transactions.
SMDSB: Efficient Off-Chain Storage Model …
231
The proposed approach also uses proof-of-identity (PoI), which provides peer security by ensuring that one peer has only one account within a system, without sacrificing the advantages of anonymity or pseudonymity. Each peer gets its identity after the registration process. Proof-of-work mining is also incorporated in the model. We use consortium blockchain structure and local mining strategy to validate the transactions. The local mining approach improves the scalability of chain and can be used to do away with centralized mining [25].
4.1 Transaction and Peer Security This subsection discusses some algorithms used in SMDSB for data storage and peer security. 1: 2: 3: 4: 5: 6: 7: 8: 9:
Input: PoI Output: verification of document upload // Authorization of peers gets validated while upload of document // if (document.sender.PoI is not authorized) then return false else //upload the document to the IPFS network and related metadata to the blockchain network.// return true end if
Algorithm 1: Algorithm for authentication of the peer 1: 2: 3: 4: 5: 6: 7: 8:
Input: PoI, file Output: integrity of off-chain and on-chain storage // Computing hash of files by adding it to IPFS // Hash = file | ipfs add // Add the IPFS hash to blockchain network// Block_index = add_metadata(Hash) //Mapping the PoI and computed hash for off-chain and on-chain Integrity of uploaded file // map_user((PoI, Block_index, Hash, timestamp, PoW, Hash_of_previous_ Block))
Algorithm 2: Algorithm for Integrity of off-chain and on-chain storage 1: Input: Registraion-Id, BlockId 2: Output: true, false 3: if (Registration-Id, Block_Id)==valid then 4: S = get_hash_of_Block_Id (Block_Id) 5: T = get_value_from_IPFS(IPFS_Hash) 6: if (S != T) then 7: return false 8: else 9: return true; 10: end if 11: else 12: return false 13: end if
Algorithm 3: Algorithm for auditability verification during transaction access Algorithm 1 is suggested during the exchange and upload of documents to verify the peers in the blockchain network. The algorithm limits access of the shared docu-
232
R. Kumar et al.
ments to malicious peers. The proof-of-identity guarantees model security. If the peers are registered in the network, unique proof-of-identity (http://nodes/register) is given to them. This proof-of-identity gets verified against the peers authorization. Data Storage in blockchain network: When a peer begins uploading a file, the file is added to the IPFS, and IPFS Hash (content-addressed hash) is added to the blockchain network. However, the mapping is done to define the peer (PoI) and its corresponding Block_Id as shown in Algorithm 2. Auditability of Data in Blockchain network: As shown in Algorithm 3, the auditability of the transactions is checked. The auditability process is done, when a peer accesses the content-addressed hash of a transaction from the blockchain network.
5 Implementation The implementation of SMDSB was carried out in the IPFS distributed file sharing system where each transaction is represented in the blockchain network by its IPFS hash value instead of the transaction. The experimental setup consists of Python Anaconda and Python Flask. Experiments were performed on Intel(R) Xeon(R) W2175 CPU @ 2.50 GHZ running Window x64-based processor with 128 GB of RAM and 1 TB of local storage. We implemented four different modules, viz. user, miner, off-chain storage, and blockchain storage, which have been discussed earlier. As shown in Fig. 2, the peers can upload the structured and unstructured files as a transaction into the blockchain network. We have tested with a maximum size of 100 MB video upload as a transaction. The uploaded original file gets stored into
Fig. 2 Upload of files on blockchain network as a transaction
SMDSB: Efficient Off-Chain Storage Model …
233
Fig. 3 Process of mining for transaction (Mp4 file) validation and block creation
IPFS distributed peer-to-peer off-chain storage initially. Furthermore, the generated IPFS hash also gets stored in the blockchain network. Figure 3 illustrates the mining process for two different files (mp4 and csv files, respectively, as a transactions) in order to validate the transactions and creation of a new block into the blockchain network. We generated the IPFS hash during the mining process and disseminated the same hash among the peers of blockchain network for verification. We also stored the file names and file extensions during the mining process so as to be able to identify the structured and unstructured files in the blockchain network. To access the files from the blockchain network, the peers have to use a contentaddressed scheme: 1. To access a structured file, the following url must be followed http://127.0.0.1: 8080/ipfs/ 2. To access an unstructured file, the following URL must be followed http://127.0. 0.1:8080/ipfs/ > filename with extension As shown in Fig. 4, we have used a content-addressed scheme (i.e., the original content gets accessed by their hash value) to access the video file with the format provided above. Similarly, the unstructured files can be accessed by using the above format. The content-addressed scheme is very efficient for transaction access and synchronization among peers (owing to the unique IPFS hash value). In the contentaddressed scheme, more than one peer can access the same transaction at the same time with the help of the distributed hash table (DHT) [2]. The IPFS hash gets modified once the transaction is updated by the peers. The modified IPFS hash is stored into the DTH. The versoning control system of IPFS framework ensures availability of the modified transaction using different timestamps. Figure 5 shows a list of transactions stored with their unique IPFS hash value, file extension, and timestamp. The IPFS hash value uniquely identifies a transaction in the blockchain network. The use of file extension is to identify the structured and unstructured files in
234
R. Kumar et al.
Fig. 4 Access of structured files by using content-addressed scheme
Fig. 5 List of transactions with their IPFS hash on blockchain network (port 5000 main chain)
the network, and finally, the timestamp is recorded in order to maintain the history of transactions. Furthermore, we have used proof-of-work (PoW) consensus approach to maintain consistency of blocks in the network, and previous hash is used for the immutability in blockchain network. In our working model, we have added almost all types of unstructured files as transactions such as csv, excel, docx, and ppt files. The main objective is to store the unstructured files in a blockchain network to maintain the records of reports, presentations, and important documents. Moreover, the chain provides synchronization during addition of new peers into the network due to the reduced size of transactions (IPFS hashes of transactions). Figure 6 shows how we have added a peer into SMDSB model by using json structure. The peer gets registered into the main chain of blockchain network. The main chain is currently running on port 5000, and we have added the peer into main chain to access the transactions of blockchain network. To add the peers, we have used postman browser. In SMDSB, an anonymous peer cannot access the chain until that peer is registered into blockchain network.
SMDSB: Efficient Off-Chain Storage Model …
235
Fig. 6 Process of adding peer in SMDSB model
Fig. 7 List of transactions accessed by a peer (port 5009 peer chain) in blockchain network
As shown in Fig. 7, the peer with port 5009 is allowed to access the main chain of the blockchain network, which was initially created on port 5000, after its registration. The peers are allowed to access the main chain only after acceptance of blockchain network consensus (PoW). In SMDSB, we have employed proof-of-work (PoW) consensus to ensure the confirmation of transactions and creation of a new block in the blockchain network. As it can be seen in Fig. 7, the same chain is copied by the peers http://127.0.0.1:5009/nodes/resolve in the network.
6 Transaction Size Analysis in Terms of Storage Cost In this section, we report the results based on implementation of the SMDSB model. The metrics used to compare the blockchain models are given by the formulas in Eqs. (1) and (2) [26]. The meaning of each symbol in Eq. (1) is as follows: Hash
236
R. Kumar et al.
(size of all block headers in blockchain), kHash (IPFS hash size), Tn (number of all transactions in blockchain), and Tx (original data size of each transaction). To compare the storage space requirement, we have studied different storage models like Bitcoin, Ethereum, and Hyperledger. Hash + (kHash Tn)Hash + T xk Compression Ratio = × k=1
(1)
Storage Space Saving Rate = 1 − Compression Ratio
(2)
Tn
The Bitcoin network contains a header size of 80 bytes, 500 on average transactions in one block, and each transaction is represented by a 250-byte hash value [1, 11, 12, 17]. Similarly, the Ethereum network contains a 508-byte header size, 1196 average transaction per block, and 100 bytes representation for each transaction [9, 13]. The Hyperledger fabric contains 72 bytes block header, 3500 transaction per block, and 123 bytes of representation for each transaction [14–16, 20, 21]. We have compared the SMDSB with the existing blockchain storage models (Bitcoin, Ethereum, Hyperledger). Figure 9 shows difference in hash size of a transaction in different blockchain storage models. In SMDSB, we have represented each transaction with 46 bytes that thereby reducing the transaction storage size by 80.60% as compared to that of Bitcoin (Fig. 8). In this experiment, we studied the cost of different storage models. The cost per transaction in Hyperledger is USD 3.86 for 5 KB storage. Similarly, the cost per transaction for Bitcoin is USD 1.30 for 5 KB of storage. However, this might change based on the underlying Bitcoin cryptocurrency. The cost per transaction in Ethereum is USD 0.25 for 5 KB of storage, and again, it depends on the underlying Ether cryptocurrency [23]. We have computed the storage and their costing per transaction based on the IBM off-chain storage. The company works for 8 h in a day, 240 days
Fig. 8 Comparison of transaction size in bytes
SMDSB: Efficient Off-Chain Storage Model …
237
Fig. 9 Storage size reduction by using SMDSB
in a year (standard time taken from IBM white paper off-chain storage) in order to manage the storage in Hyperledger for each transaction. We evaluated the existing storage models shown in Fig. 8 to compute the transactions size and their per year cost. The representation of a Bitcoin transaction takes 250 bytes = (250/1024)= 0.245 KB for each transaction (1 transaction per second). Similarly, Ethereum, Hyperleger, and IFPS take 0.098 KB, 0.121 KB, and 0.0449 KB for representation of each transaction, respectively. We have computed the storage size required in a year for 1 transaction per second (TPS) and their cost (shown in Table 1).
Table 1 Storage size and cost evaluation of different blockchain model Storage size and cost evaluation Bitcoin
Ethereum
Hyperledger
SMDSB
Storage size (in Bitcoin)= 1 TPS * 0.245 * 3600 s/h * 8 /per day * 240 per year = 296,900 KB = 289.945 GB= 0.2832 TB/year/transaction Storage cost in bitcoin for 1 TPS (in one year) = 296,900 * USD 0.26 = USD 77,194 Storage size (in Ethereum) = 1 TPS * 0.098 * 3600 s/h * 8 /per day * 240 per year = 661.5 KB = 0.646 GB= 0.000631 TB/ year/transaction Storage cost in ethereum for 1 TPS (in one year) = 661.5 * USD 0.05 = USD 33.07 Storage size (in Hyperledger) = 1 TPS * 0.121 * 3600 s/h * 8 /per day * 240 per year = 810.791 KB = 0.792 GB= 0.000773 TB/year/transaction Storage cost in Hyperledger for 1 TPS (in one year)} = 810.791 * USD 0.77 = USD 624.309 Storage size (in SMDSB)} = 1 TPS * 0.0449 * 3600 s/h * 8/per day * 240 per year = 303.223 KB = 0.2961 GB= 0.000289 TB/year/transaction
238
R. Kumar et al.
Thus, the proposed model (SMDSB) can minimize the storage cost of existing blockchain models by 99.89% in Bitcoin, 54.15% in Ethereum, and 64.19% in Hyperledger. As shown in Fig. 9, we have applied SMDSB to reduce the storage size of existing blockchain storage models. We can observe that SMDSB achieves improvement in the storage by 81.54% in Bitcoin, 53.86% in Ethereum, and 62.59% in Hyperledger. We conclude that SMDSB is advantageous when sizes of original transactions vary significantly, which appears to be the case in most of the real-time scenario. Thus, it can be observed that, by utilizing SMDSB model in Bitcoin, Ethereum, and Hyperldeger, the storage size of blockchain network can be significantly reduced.
7 Conclusion and Future Enhancement In this paper, we propose a storage model for data sharing in blockchain (SMDSB) using IPFS-based off-chain storage. In SMDSB, we can store structured and unstructured files as a transaction in the blockchain network. To reduce the size of transactions in the blockchain ledger, we store original files in IPFS off-chain storage, and the returned IPFS hash is mapped into the blockchain network. The proposed model is not only effective for Bitcoin transaction types but also for other types of transactions. We have compared our model with the storage model used in Bitcoin, Ethereum, and Hyperledger composer. It can be seen from experimental results that our storage model is optimal in terms of transaction storage size, availability (distributed off-chain storage by using IPFS), and synchronization (adding new nodes becomes easy owing to reduced size of transaction) of the nodes. Moreover, in case of file storage as a transaction, only files of size up to 100 KB are allowed to be uploaded with other models, whereas in SMDSB, we have successfully tested with upload of file of size up to 100 MB size into the blockchain network with the help of content-addressed hash mapping. By using SMDSB, the transaction size of Bitcoin can be reduced by 81.54%, Ethereum by 53.86%, and Hyperledger by 62.59%. In addition, the SMDSB model reduces the cost of per transaction storage (transaction per year) in Bitcoin by 99.89%, Ethereum by 54.15%, and Hyperledger by 64.19%. Furthermore, the SMDSB can also be used for a real-time application. In the future, we wish to use SMDSB model to store different types of bulky data generated by IoT devices and cyber-physical systems.
References 1. A. Tenorio-Fornés, S. Hassan, J. Pavón, Open peer-to-peer systems over blockchain and ipfs: an agent oriented framework, in Proceedings of the 1st Workshop on Cryptocurrencies and Blockchains for Distributed Systems (ACM, 2018), pp. 19–24
SMDSB: Efficient Off-Chain Storage Model …
239
2. E. Androulaki, A. Barger, V. Bortnikov, C. Cachin, K. Christidis, A. De Caro, S. Muralidharan, Hyperledger fabric: a distributed operating system for permissioned blockchains, in Proceedings of the Thirteenth EuroSys Conference (ACM, 2018), p. 30 3. J. Benet, Ipfs-content addressed, versioned, p2p file system (2014). arXiv:1407.3561 4. J. Bonneau, A. Miller, J. Clark, A. Narayanan, J.A. Kroll, E.W. Felten, Research perspectives and challenges for bitcoin and cryptocurrencies (extended version). Cryptology ePrint Archive, Report 2015/452 (2015) 5. M. Brandenburger, C. Cachin, R. Kapitza, A. Sorniotti, Blockchain and trusted computing: problems, pitfalls, and a solution for hyperledger fabric (2018). arXiv:1805.08541 6. J.D. Bruce, The mini-blockchain scheme. White paper (2014) 7. M. Dai, S. Zhang, H. Wang, S. Jin, A low storage room requirement framework for distributed ledger in blockchain. IEEE Access 6, 22970–22975 (2018) 8. A. Dorri, S.S. Kanhere, R. Jurdak, P. Gauravaram, Blockchain for IoT security and privacy: the case study of a smart home, in 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom workshops) (IEEE, 2017), pp. 618–623 9. W.G. Ethereum, A secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper 151, 1–32 (2014) 10. J. Gao, B. Li, Z. Li, Blockchain storage analysis and optimization of bitcoin miner node, in International Conference in Communications, Signal Processing, and Systems (Springer, Singapore, 2018), pp. 922–932 11. J. Göbel, A.E. Krzesinski, Increased block size and Bitcoin blockchain dynamics, in 2017 27th International Telecommunication Networks and Applications Conference (ITNAC) (IEEE, 2017), pp. 1–6 12. https://www.ibm.com/downloads/cas/RXOVXAPM 13. J. Jogenfors, Quantum bitcoin: an anonymous, distributed, and secure currency secured by the no-cloning theorem of quantum mechanics, in 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) (IEEE, 2019), pp. 245–252 14. A. Kumar, A review on implementation of digital image watermarking techniques using LSB and DWT, in Information and Communication Technology for Sustainable Development (Springer, Singapore, 2020), pp. 595–602 15. R. Kumar, N. Marchang, R. Tripathi, Distributed off-chain storage of patient diagnostic reports in healthcare system using IPFS and blockchain, in 2020 International Conference on COMmunication Systems & NETworkS (COMSNETS) (IEEE, 2020), pp. 1–5 16. R. Kumar, R. Tripathi, Implementation of distributed file storage and access framework using IPFS and blockchain, in 2019 Fifth International Conference on Image Information Processing (ICIIP) (IEEE, 2019), pp. 246–251 17. A. Kumar, Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. Int. J. Cloud Appl. Comput. (IJCAC) 9(3), 22–36 (2019) 18. Q. Lin, H. Wang, X. Pei, J. Wang, Food safety traceability system based on blockchain and epics. IEEE Access 7, 20698–20707 (2019) 19. S. Matetic, K. Wüst, M. Schneider, K. Kostiainen, G. Karame, S. Capkun, BITE: bitcoin lightweight client privacy using trusted execution. IACR Cryptol. ePrint Archive 2018, 803 (2018) 20. A. Palai, M. Vora, A. Shah, Empowering light nodes in blockchains with block summarization, in 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS) (IEEE, 2018), pp. 1–5 21. E. Palm, O. Schelén, U. Bodin, Selective blockchain transaction pruning and state derivability, in 2018 Crypto Valley Conference on Blockchain Technology (CVCBT) (IEEE, 2018), pp. 31–40 22. P. Thakkar, S. Nathan, B. Viswanathan, Performance benchmarking and optimizing hyperledger fabric blockchain platform, in 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) (IEEE, 2018), pp. 264–276 23. P. Yuan, X. Xiong, L. Lei, K. Zheng, Design and implementation on hyperledger-based emission trading system. IEEE Access 7, 6109–6116 (2018)
240
R. Kumar et al.
24. S. Zhang, E. Zhou, B. Pi, J. Sun, K. Yamashita, Y. Nomura, A solution for the risk of nondeterministic transactions in hyperledger fabric, in 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) (IEEE, 2019), pp. 253–261 25. Q. Zheng, Y. Li, P. Chen, X. Dong, An innovative IPFS-based storage model for blockchain, in 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI) (IEEE, 2018), pp. 704–708 26. Z. Zheng, S. Xie, H. Dai, X. Chen, H. Wang, An overview of blockchain technology: architecture, consensus, and future trends, in 2017 IEEE International Congress on Big Data (BigData Congress) (IEEE, 2017), pp. 557–564
Ray Tracing Algorithm for Scene Generation in Simulation of Photonic Mixer Device Sensors Sangita Lade, Milind Kulkarni, and Aniket Patil
Abstract In industrial and automotive environment, speedy and reliable procurement of 3D data has become essential and is one of the main requirements for future development. Photonic mixer device (PMD) sensors which are based on time of flight (TOF) technology aim at delivering speedy images with depth data to suit the real-time applications. A software simulation of such sensor is an integral for its design and calibration. Expensive prototypes can be avoided for sensor testing and calibration as the sensor characteristics can be fixed by using a software simulator. Ray tracing is crucial for scene generation, which is a key step in sensor simulation. This paper proposes sequential algorithm for ray tracing for scene generation in the simulation process of PMD sensors. The algorithm is implemented for scene generation using basic shapes like plane, box, sphere and triangle with all possible combinations. It is observed that some shapes like triangle take more time to compute its 3D information as compared to other shapes like box, plane and sphere due to complexity. It is also possible to get 3D information for all possible combinations of basic shapes in the scene generation. The time taken by the system to generate a scene with multiple objects is not affected by the order in which the objects are placed in the scene. Scene generation takes more time, if the triangle is involved in scene generation. Keywords Photonic mixer device (PMD) · Time of flight (TOF) · Ray tracing · Simulation
S. Lade (B) · M. Kulkarni Computer Engineering, Vishwakarma University, Pune, India e-mail: [email protected] A. Patil ifm Engineering Private Ltd, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_25
241
242
S. Lade et al.
1 Introduction Today, a lot of applications demand 3D images rather than 2D images in order to inculcate additional features by using the depth information. Recently, TOF range imaging has emerged. It effectively generates the depth data of an object at a desired quality for many applications. PMD is one such technology which makes use of TOF method to produce 3D images. Ray tracing is a phenomenon of tracing the path of light through each pixel in an image plane to find the intersection of the rays with a scene consisting of a set of geometric primitives like polygons, spheres, cones and so on. In the simulation of PMD cameras, ray tracing plays a key role in scene generation, which is one of the phases of PMD sensor simulation. Simulation of PMD sensor is essential for its lowcost hardware design and calibration. Simulator can be used to fix the different sensor characteristics to suit the installation environment. The paper presents an algorithm for ray tracing for scene generation in the simulation process of PMD sensors. The paper is organized into eight sections. Section 1 introduces the PMD technology, thereby describing how TOF method is used by the PMD sensors. Section 2 explains the need of simulation in PMD sensors. Section 3 demonstrates a sequential algorithm for ray tracing. Section 4 constitutes the experimental results on central processing unit (CPU). Finally, Sect. 5 concludes with the analysis of the results obtained on CPU.
2 Simulation Process There are various imaging technologies enabling the perception of depth in the images to reproduce three dimensions as experienced through the human eyes. Methods like triangulation, stereo vision and structured light are prominent examples of measuring distance between the object and the sensor. TOF is another efficient distance measuring method, in which the distance is computed by estimation of the elapsed time between the emission of the light and its arrival at the pixel of the sensor. PMD is a technology based on TOF method which generates 3D images in real time. The key component of the PMD sensors is an array of sensors. These arrays of sensors are capable of measuring the distance to the target, pixel by pixel in parallel. Hence, these cameras have advantages of fast imaging and high lateral resolution and they also have leverage of procuring the depth data of the captured scene [1] (Fig. 1). Simulation of PMD sensors is highly beneficial for easy evaluation and modification of sensor parameters like the sensor frequency, focal length and resolution. CamCube is a PMD sensor having the sensor characteristics given in Table 1. It is designed for mobile robot. Likewise, there are many PMD sensors having application-specific design and sensor parameters [2].
Ray Tracing Algorithm for Scene Generation …
243
Table 1 CanCube characteristics Image size
Frame rate
Lens
Range
Field of view
200 × 200 × 40
40–80 fps
f = 12, 8
0.3–7 m
40
Every different application of a PMD sensor demands it to exhibit different characteristics. Building of expensive prototypes to suit the requirement can be avoided by an efficient simulation program [3]. The simulation process for PMD sensors consists of many phases, including ray tracing, scene generation, power calculation and CMOs physics which is later given for manufacturing of the sensor. The steps are illustrated in Fig. 2. Fig. 1 TOF method for depth calculation (Source https://www.mdpi.com/ 1424-8220/18/6/1679/htm)
Fig. 2 Simulation process of PMD sensor
244
S. Lade et al.
Fig. 3 Ray tracing
3 Algorithm for Ray Tracing 1.
Get all the required parameters for the algorithm: (a) Sensor grid data. (b) Object parameters. (c) Light source coordinates.
2.
Allocate the memory for data. (a) SensorGrid[MXN]: To store the pixel position(X, Y, Z) data. (b) ObjectList[n]: To store the objects and their corresponding parameters. (c) RayMatrix [MXN]: To store the ray parameters like ray origin and ray direction. (d) LightSource: To store the coordinates of the light source in the scene.
3. 4.
Populate SensorGrid with the sensor grid data. Populate ObjectList with user-defined data. (a) For every object get from user (b) Object equation parameters i. Plane: [P, Q, R, D] //Plane’s Eqn: Px + Qy + Rz + D = 0. ii. Triangle: [P, Q, R] //Triangle’s vertices. iii. Sphere: [R, C] //Sphere’s radius R and center coordinates C. iv. Box: [B0, B1] //Bounding Box Coordinates B0, B1. (c) ObjectReflectivity.
5.
Compute the ray direction for every pixel in the SensorGrid.
Ray Tracing Algorithm for Scene Generation …
245
(a) Ray.orig = Focal point of the sensor for every ray. (b) Ray.dir = SensorGrid [i] – Ray.orig. 6.
Compute the magnitude of the direction vector for every ray. (a) Magnitude = sqrt(Ray.dir.x2 + Ray.dir.y2 + Ray.dir.z2 .
7.
Normalize the direction vector for every ray. (a) Ray.dir = Ray.dir/Magnitude.
8.
Allocate memory for outputs: (a) (b) (c) (d) (e)
9.
DistanceMatrix[M × N], RayObjectIntersection ObjectReflectivityMatrix[M × N], ObjectVisibilityMatrix[M × N], ObjectNormalMatrix[M × N].
For every r in ray matrix: (a) Intersections (b) RayObjectIntersection (c) DistanceMatrix[r]: Euclidean distance between the SensorGrid and the intersection point. (d) ObjectReflectivityMatrix[r]: Reflectivity at the intersection point. (e) ObjectNormalMatrix[r]: Normal at the intersection point. (f) ObjectVisibilityMatrix[r]: Check visibility (intersection point).
10. Save all the outputs. 11. Find RaySceneIntersection: For every object in ObjectList: 1. (a) If object == ‘PLANE’: i. ii. iii. iv. v.
Compute plane’s normal as N = [P, Q, R]. Normalize the plane normal N. Compute the value of t (Ray EqnParameter). t = −(D+DotProduct(N,Ray.orig)) DotProduct(N,Ray.dir) If (t > 0): Substitute t in the ray equation to compute the ray–plane intersection point P.
(b) If object = = ‘TRIANGLE’: i. ii. iii. iv. v. vi.
Compute N = CrossProduct((Q − P), (R − P)). Normalize the triangle Normal N. D = DotProduct(N, P). Compute the value of t (Ray EqnParameter). t = D+DotProduct(N,Ray.orig) DotProduct(N,Ray.dir) if (t > 0): Substitute t in the ray equation to compute the ray–plane (containing the triangle) intersection point P.
246
S. Lade et al.
vii. Determine if point P is inside the triangle, using the method given by Moller Trumbore [4]. (c) If object = = ‘SPHERE’: i. ii. iii. iv. v. vi. vii.
Substitute the ray’s equation in the equation of the sphere. A quadratic equation is obtained with variable t (Ray EqnParameter). Compute the roots oft. √ −b± b2 −4ac t= 2a = b2 − 4ac If ( >= 0) Substitute t in the ray equation to compute the ray–sphere intersection point P. viii. Compute normal at the ray–sphere intersection point: N = P[X, Y, Z] – C[X, Y, Z] ix. Normalize the normal N. x. Else If ( < 0): There is no intersection. (d) If object == ‘BOX’: i. Compute t (Ray Eqn Parameter) by a method given by Kay and Kajiya [5]. ii. Substitute the value of t in the ray equation to compute the ray–box intersection point P iii. Compute the Normal N at the intersection po i n t by determining the plane of the intersection point of the axis aligned box. 12. Check Visibility: (a) Compute lightVector: RayObjectIntersectionPoint -lightSource[l]. (b) Normalize the lightVector. (c) Fetch the object normal N at the RayObjectIntersectionPoint. (d) If (DotProduct(lightVector, N) > 0): //Lambert’s Cosine Law i. The point is visible, return true. ii. If point is found not visible through any light source, return false. The flowchart for the sequential ray tracing algorithm is shown in Fig. 4. Figure 5 shows flowchart for Ray traingle intersection, Fig. 6 shows flowchart for Ray plane intersection, Fig. 7 shows flowchart for Ray Sphere intersection while Fig. 8 shows the flowchart for Ray box intersection.
4 Experimental Results Table 2 shows the time taken by the CPU for each object and for all possible combinations of the objects. It is observed that the time taken by the system to generate a scene with multiple objects is not affected by the order in which the objects are
Ray Tracing Algorithm for Scene Generation …
247
Fig. 4 Ray tracing algorithm
placed in the scene. It is also observed that scene generation takes more time, if the triangle is involved in scene generation.
248
S. Lade et al.
Fig. 5 Ray traingle intersection
5 Proposed Algorithm The proposed algorithm uses the object list as the list of basic shapes like plane, box, sphere and triangle. Figure 3 describes how ray is traced for the object. Generally, a light source is used for illumination of light which shoots many rays toward the sensor grid. Sensor grid is logically placed between the object and the light source. All the rays emanating from the light source hits the sensor grid and then to the object. But some of the rays may not hit the objects. When the rays hit the object, the distance between the light source and the object is calculated.
Ray Tracing Algorithm for Scene Generation …
249
Fig. 6 Ray–plane intersection
When an object is placed in front of the camera, there is only one ray which is exactly perpendicular to only one pixel, that is, the center point of the plane. When the ray is shooting from the camera, it hits exactly one pixel in perpendicular and also other pixels with some angles. The intersection of each ray with each pixel is calculated, which is the Z coordinate of the object. The proposed algorithm constitutes computing intersection of each ray with the object, illustrated as follows.
250
Fig. 7 Ray–sphere intersection
S. Lade et al.
Ray Tracing Algorithm for Scene Generation … Fig. 8 Ray–box intersection
251
252 Table 2 Experimental results
S. Lade et al. S. No.
Object
Time (s)
1
Plane
0.128000
2
Sphere
0.108000
3
Triangle
0.279000
4
Box
0.085000
Plane, triangle
0.344000
Plane, sphere
0.185000
Plane, box
0.159000
Triangle, sphere
0.325000
Sphere, box
0.137000
Plane, triangle, sphere
0.442000
Plane, triangle, box
0.382000
Triangle, sphere, box
0.365000
Plane, sphere, box
0.213000
Plane, triangle, sphere, box
0.508000
6 Conclusions Scene generation is a fundamental step in the simulation of TOF-based PMD sensors. In this paper, we have proposed an algorithm for ray tracing for scene generation in the simulation process of PMD sensor. The time taken by the system to generate a scene with multiple objects is not affected by the order in which the objects are placed in the scene. Scene generation takes more time, if the triangle is involved in scene generation. Acknowledgements This work is sponsored by IFM Engineering Private Limited, Pune, India. It is a privately held global manufacturer of sensors and controls for industrial automation. It produces more than nine million sensors annually. IFM has more than 70 subsidiaries located in all major countries, including North America, South America, Asia, Europe and Africa. The authors would like to thank Mr. Jitesh Butala, Aniket Patil and their team for their support and guidance.
References 1. I. Thorsten Ringbeck, A 3D time of flight camera for object detection, in Optical 3-D Measurement Techniques 09-12.07.2007 ETH Zürich Plenary Session 1: Range Imaging I 2. G. Alenya, ToF cameras for active vision in robotics. Sens. Actuators A Phys. (Elsevier, 2014) 3. M. Keller, A simulation framework for time-of-flight sensors. In: IEEE International Symposium on Signals, Circuits and Systems (2007) 4. T. Moller, B. Trumbore, Fast, minimum storage ray-triangle intersection. J. Graph. Tools 2(1), 21–28 (1997)
Ray Tracing Algorithm for Scene Generation …
253
5. T.L. Kay, J.T. Kajiya, Ray tracing complex scenes, in Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques (ACM, New York, NY, USA, SIGGRAPH ’86), pp. 269–278
A Study of Quality Metrics in Agile Software Development Krishna Chakravarty and Jagannath Singh
Abstract Traditional waterfall or iterative waterfall methods are being used in software industries since its birth. In the last decade, time to market and flexibility in accommodating customer requirements became very important aspects in software project delivery. Keeping these two driving factors in mind, agile methodology became very popular, where continuous cycle of development and testing involving high customer interaction is promoted and this practice results in an early time to market. Hence, the quality aspects for these projects become very important in today’s context. Quality metrics play an important role in software engineering as they transform both the customer expectation and operational performance to metrics which can be measured, evaluated and compared. These metrics focus primarily on three aspects: the quality aspects of the product, the process which was followed during its development and some important project parameters. These metrics are influenced by different factors such as functionality, usability, reliability and portability. This research work focuses on literature survey on different quality metrics used in traditional software development, their applicability in agile method and finally searches for the right kind of quality metrics which can be used in agile projects. Keywords Software quality metrics · Agile method · Agile metrics
1 Introduction The waterfall model defines software development process in a sequential flow. This was the earliest software development life cycle approach. The different life cycle stages in this method are feasibility study, requirement analysis, design, coding, testK. Chakravarty (B) · J. Singh School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, Odisha 751024, India e-mail: [email protected] J. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_26
255
256
K. Chakravarty and J. Singh
ing and finally implementation. Each stage starts when a significant portion of earlier stage is completed. The final delivery takes a longer time, and this process resists to adopt frequent requirement changes from customer. Gradually, this process was enhanced, and iterative waterfall method became very popular to introduce phase containment of errors. The iterative model includes team reviews and customer’s feedback. Slowly the software industry’s focus was shifted to early time to market and capability to accommodate continuous changes in customer requirements [1]. This created a major challenge in sequential process, and hence, agile method got introduced. The agile method offers flexible nature of software development in iterative stages. The key aspects of agile process are (1) People and team work get more priority than processes and tools, (2) Running features developed in product are more important than detailed documentation, (3) Continuous customer interaction is necessary instead of contract negotiation, (4) Instead of following a predefined plan, responding to change is the key here [2]. As the agile process became popular, all companies started giving more attention on faster development process and flexibility of adopting change requirements [3]. In this changed environment, the quality of the software has a risk of deteriorating. Most of the companies started using some modified version quality metrics of traditional software development. In this paper, we present a consolidated list of quality metrics available and trying to investigate most suitable metrics for agile software development. Organization of the paper: The Sect. 2 presents some basic concepts on agile method. In Sect. 3, we discussed the various quality metrics used in traditional software development and their applicability in agile method. Section 4 represents the research methodology used in this work, and two research questions have been raised. Section 5 represents the literature review details. Our findings from the detailed literature review are furnished, and the research questions are addressed in the same section. Section 6 concludes the current research work with future scope of study.
2 Agile Methodology In this section, basic concepts of agile methodology are discussed. Figure 1 explains the agile process flow and also compares the method with waterfall process. Scrum is a agile framework where complex problems are adaptively addressed, while software products are being delivered on time with creativity and high productivity and brings the highest possible value. The events are time-boxed, means that every event has a maximum duration. Scrum method consists of multiple iteration known as Sprints [3]. A repository is maintained where every requirements are tracked with details and along with further changes. Then, the requirements are prioritized to be selected, developed and completed for each release(sprint). A product backlog is a list of requirements to be done. Requirements or items are ranked with feature descriptions. Items should be broken down into user stories. The sprint backlog is created by selecting a set of requirements from product backlog which are prioritized for that particular sprint. During each sprint, top items of product backlog are selected and
A Study of Quality Metrics in Agile Software Development
257
Fig. 1 Comparison of waterfall and agile method
turned into sprint backlog. Then, team works on the defined sprint backlog. A daily meeting is conducted to check the issues and progress. At the end of the sprint, team delivers product functionality. The next set of prioritized items are picked up from product backlog to be developed in the next sprint. Figure 1 explains the comparison of traditional and agile methods in parallel, R1, R2 explains multiple releases of working software in agile method. As this agile process got evolved over years, it became very important to map the software quality metrics from waterfall to agile method.
3 Software Quality Metrics Software quality metrics provides the measurement of attributes representing the quality of a software. This also includes refinement and evolution of quality aspects in software project delivery. Metrics provides an insight into the software development and hence can be used for better decision making. The quality metrics can be treated as important indicator for measurement of efficiency and effectiveness of software process [2]. The popular quality metrics used in traditional software development (TSD) can be categorized into three types: product metrics, process metrics and project metrics. Product metrics are characteristics of the product like mean time to failure (time between failures), defect density (defects per unit software size) and customer satisfaction (in a scale of one to five). Process metrics are used to improve the development and maintenance process of the software like defect arrival pattern, defect removal efficiency (DRE). Project metrics involve the project execution parameters
258
K. Chakravarty and J. Singh
like cost, schedule and staffing, etc. The examples of project metrics are schedule variance (difference between actual and estimated scheduled completion), effort variance (difference between actual and planned effort) and size variance (difference between actual and estimated size) and productivity. But the above-mentioned quality parameters are not appropriate for agile software developments (ASD) because agile process gives (a) less importance to documentation (b) high value to time to market (c) high value to flexibility in adjusting customer requirements even in later stages. Also, the agile process is broken into multiple sprints. Hence, the calculations of some basic metrics like productivity, schedule variance need to be redefined again [3, 10]. Hence, a further research work is required to identify suitable measurable metrics in agile method. In this study, we present the importance of quality metrics in agile environment, summarize the research work took place in last five years and also focus on the best possible metrics which can be applied to agile software development.
4 Research Methodologly Quality parameters are integral part of any software development. As we observed, the IT industry is slowly moving toward agile environment. The research work was focused on how topic of agile and quality metrics has been studied together. The agile philosophy has been very popular in recent time. Hence, older research papers will not be of much value. The development in last five years needs to be given high importance in this regard. A detailed literature review was performed with the following steps (Fig. 2).
4.1 Determining Research Questions Two questions related to the research topic were outlined in this work area, RQ1: What quality metrics are mostly used in agile projects today? RQ2: How can we derive at a perfect set of balanced quality metrics parameter in agile method? The problem is most of the traditional metrics will not be suitable any more in agile. For example, in project metrics like sprint velocity or burn down chart should be considered instead of effort and size variance.
A Study of Quality Metrics in Agile Software Development
259
Fig. 2 Research method followed
4.2 Search Process The aim of the search was to get all the research work done so far on agile quality metrics. The databases of Science Direct, IEEE and Google Scholar were searched with the key word ‘Quality metrics in Agile’. As this is the latest trend in software industry, hence only the papers published since 2014 have been considered in this study. Total 95 papers were downloaded. After preliminary study of all downloaded papers, we have considered 11 papers for this study which are very closely related and found appropriate for the current work.
5 Literature Review Following papers were studied in detail. The latest development in agile methodology gives an insight that practitioners are focusing on quality parameters in a different way from traditional approach. The summary of literature survey is furnished below. The time-boxed events, priority given to customer interactions changed the theme of quality aspects in agile. In agile concepts, people interactions are given more impor-
260
K. Chakravarty and J. Singh
tance than documenting the process. One paper focused on people teamwork and considered factors like team size, experience which is a new dimension in quality metrics. Commonly performance, code and testing metrics are used as quality metrics. The methods used in these papers are mostly survey with IT organizations or literature surveys of prior works. The research gaps lie in the approach to achieve the right balance of metrics for any agile project. Different tools can be used to capture the data for further analysis (Table 1).
5.1 Discussion on Findings A thorough analysis was carried out, and it is found that all the agile quality metrics can be categorized into three major categories—process, product and project [10]. But it is also very important to consider the people aspect as team work remains the heart of agile concept [7]. Hence, it is suggested that metrics which can define and measure people interaction in team should be considered with equal importance. The findings in the above papers can be helped to find out answers to the research questions RQ1: What quality metrics are mostly used in agile projects today? Answer: Although various metrics are used in various projects across the globe, the most used metrics are testing metrics, schedule metrics and performance metrics [1] (refer Tables 3 and 4). Table 2 furnishes the product metrics which are measured when the software product is running in production, after the completion of full life cycle of software development. Table 3 summarizes very important in-process metrics which are measured during the software development life cycle, in every sprints. All important project metrics related to schedule, cost and size are listed in Table 4, and these are measured during sprints as well as after each sprint cycle. RQ2: How can we derive at a perfect set of balanced quality metrics parameter in agile method? Answer: The four sets of metrics proposed include the most commonly used metrics across the globe. A perfect balance in these four categories need be achieved. More research work was carried out in this area to find the approach to achieve right balance of metrics. Two important approaches have been discussed—(a) Goal Question Metrics (GQM) method, (b) Socially constructed method (Evidence, Expectations, Evaluations).
5.2 GQM Method GQM as described, consists of three important stages—goal definition, question identification and finding a metric [5]. Different Goals can be set in different projects. Metrics should be used as per selected goals (refer Fig. 3). Tools should be selected
A Study of Quality Metrics in Agile Software Development
261
Table 1 Literature review S. No.
Author, year
Objective, methods used
Conclusion
1.
Marikar et al. [1]
The purpose of this work was to identify different measuring options in agile methodology and to understand the importance of software metrics
Defect cycle time, defect spillover trends can be used to measure quality A burn down chart is useful for predicting outstanding work
2.
Padmini et al. [9]
This paper seeks important software metrics and their usage in agile projects. Online survey is done with 26 IT organizations
ten useful metrics found used by these practitioners as burn down chart, velocity, test coverage, etc.
3.
Shah et al. [8]
This is a literature survey work to find productivity measures and improvements in the agile software development process
To improve productivity, focus areas should be output quantities, cost, timeliness, autonomy, individual effectiveness
4.
Mishra [3]
This paper discusses processes in system and business process engineering in agile method
This work suggests line of code (LOC), function point (FP), object oriented metrics can be used
5.
Karkli¸na1 et al. [5]
Goal-question-metric (GQM) method is proposed to select the most balanced and appropriate quality metrics for agile projects
Different goals can be set for quality metrics in different projects through discussions with stakeholders. The selected metrics can evolve through iterative reviews along with new parameters added
6.
Avasthy [6]
This work analyzes different agile metrics and examines their impact on development of product lifecycle
The literature promotes the agile culture as high quality development in more collaborative way keeping strong focus in business
7.
Freire et al. [7]
A Bayesian networks-based model is proposed to improve the teamwork quality of agile teams
People aspect is very important no such is existing to define and measure. The proposed model can be implemented in scrum-based teams
8.
Barata et al. [4]
This work focuses on how quality assessment can be refined and improvement in agile method
Three-dimensional approach suggested—evidence (fact-based), expectations—team defined goals, evaluation: continuous refinement
9.
Eisty et al. [2]
To better understand how software It is observed that performance developers use metrics in software and testing and code metrics are development process and how this mostly used metrics can be applied in code complexity
10.
Abdalhamid et al. [10]
This is a literature survey to Three categories of quality explore the quality aspects as metrics observed—product, projects get transformed into agile project and process metrics methods
11.
BUDACU et al. [11]
The paper proposes an architecture of an automated system used to provide real-time metrics for measuring agile team performance
The proposed architecture will be helpful in capturing metrics data for further analysis and improvement
262
K. Chakravarty and J. Singh
Table 2 Quality metrics used in agile: product metrics Serial No Metrics Calculations 1. 2.
Defect count Defect age
3.
Performance metrics
4.
Open defect severity index
5.
Customer satisfaction survey
Total defect count Time to fix the defect = the resolution date − opening date Applicable for softwares executing on high-performance computing plat forms, the metrics focus on execution time, storage (e.g., RAM or disk space), or scalability (e.g., time vs. CPUs/-cores) The value of defects which are open in production Can vary from very low, low, moderate, high, very high
for measurement of those metrics. The earlier defined goals should be periodically reviewed by stakeholders and this should result in evolution of metrics. The metrics must be selected and evaluated as an informative insight into the project and this can be used for later decision making.
5.3 Socially Construction Approach Socially construction Approach consists of three major steps—Evidence, Expectations, Evaluations [4]. It is believed that people , project, process and products are deeply interlinked in agile software development projects. A socially constructed metrics is proposed in this approach. This should involve stakeholders in early phases to ascertain metrics. The team also need to be educated and empowered to focus and bring metric-related results. Three dimensions are proposed: (a) Evidence—this is based on facts like defect count, compared with last performance of the team. (b) Expectations—Team defined goals as discussed and agreed by stakeholders (like 3 defects delivered in product is acceptable). (c) Evaluation—The parameters should be continuously evaluated by team. For example, goals may change in complex project or for a difficult customer.
A Study of Quality Metrics in Agile Software Development
263
Table 3 Quality metrics used in agile: process metrics Serial No Metrics Calculations 1.
Defect spillover trend
2.
Testing metrics
3.
Defect count
4.
Running tested features (RTF)
5.
Percentage of adopted work
6.
Defect removal efficiency
Defect spillover measures the defects that do not get fixed during a given iteration or sprint by simply counting the defects remaining at the start of each sprint or iteration. Such defects can accumulate over time when a team ignores them, leading to technical debt, which decreases productivity (1) Unit test coverage, (2) passed/failed test cases, (3) failed test case priorities (1) Defect opened per sprint, (2) defect closed per sprint The desired software is broken down into features/stories. Each feature/story has one or more automated acceptance tests. When the tests work, the feature/story is implemented as desired Measure at every moment in the project, how many features/stories pass all their acceptance tests Adopted work is work that is brought forward from the product backlog at any point during the sprint because the team has completed their original forecast early The defect removal efficiency (DRE) gives a measure of the development team ability to remove defects prior to release. It is calculated as a ratio of defects resolved to total number of defects found
6 Conclusion The objective of this research work was to understand the importance of quality metrics in agile projects as software industry has taken a turn toward agile developments in a major way. A thorough study suggests that four main categories of quality metrics can be applied to agile projects—process, product, project and people metrics. However, the first three categories are widely used, whereas the people metrics are still in conceptual level, not much used in a measurable context. The important metrics in these four categories are discussed in detail in this paper (Tables 2, 3, 4 and 5). The most used metrics are performance metrics, testing metrics and schedule metrics (burn down charts). This research work also suggests that a well-defined approach is required to find the perfect balance while applying these categories of metrics in
264
K. Chakravarty and J. Singh
Table 4 Quality metrics used in agile: project metrics Serial No Metrics 1.
Delivery on time
2.
Sprint burn down chart
3.
Release burn up chart
4.
Size variance
5.
Effort variance
6.
Velocity
7.
Productivity metrics
8.
Lead time and cycle time
Fig. 3 GQM method
Calculations Focus on committed end date with expected requirements implemented in the end product A burn down chart is a graphical representation of work left to do versus time. It is a chart of outstanding work With each sprint getting completed, the delivered functionality grows, and the release burn up chart depicts this progress Estimating the size of the project after each iteration or sprint to get the actual size of the project by using function points and such Estimation of the development effort can be calculated in hours spent or in person-months The number of story points delivered on a given sprint (1) LOC/Day (for team) (2) FP per month (team) (3) Lines of code/hour (with a team of 4) (4) Resolved issues/month (5) Function points/months (per developer) These are calculated for stories, defects, tasks
A Study of Quality Metrics in Agile Software Development Table 5 Quality metrics used in agile: people metrics Serial No Metrics 1.
Autonomy
2.
Expertise
3.
Experience
4.
Team size
5. 6.
Mode of communication Daily meetings
265
Calculations (very low, low, moderate, high, very high) Team is self sufficient, no external influence Having the right kind of technical and domain knowledge Mean experience, in years, working on software development projects and agile developments Right set of collaborative people with motivating leader Face to face is best Conduct of daily meetings and people joining daily meetings
any project. Two methods have been discussed here—goal question metric (GQM method) and socially construction approach. In GQM, goals should be defined for each project differently, as per the project needs and priorities. The socially constriction approach focuses on three aspects—evidence (team’s past performance), expectation (stakeholder’s expectation and customer’s agreement on acceptance) and evaluation (continuous refinement of metrics). In our future work, we are planning to carry out to find the applicability of people metrics, and also, the desired weight ratio of different categories of metrics needs to be well defined.
References 1. F.F. Marikar, P. Tharaneetharan, K.D.S. Thiwanka, M.I.M. Shafni, I. Guruge, A Framework for Software Engineering Metrics for Agile Software Development: A Sri Lankan Perspective, vol. 2, Issue VIII (2014). ISSN 2320-6802 2. N.U. Eisty, G.K. Thiruvathukal, C. Jeffrey, A Survey of Software Metric Use in Research Software Development, Loyola University Chicago Loyola eCommons Computer Science, Faculty Publications, 10-29-2018 3. S.K. Mishra, Advance software engineering and software metrics. J. Basic Appl. Eng. Res. 1(10) (2014), 28–43. ISSN: 2350-0255 4. J. Barata, S. Coyle, Developing socially-constructed quality metrics in agile: a multi-faceted perspective, in Thirty Seventh International Conference on Information Systems (Dublin, 2016) 5. K. Karkli¸na, R. Pirtae, Quality metrics in agile software development projects. Inf. Technol. Manage. Sci. 21, 54–59 (2018). ISSN 2255-9094 (online). ISSN 2255-9086 (print) 6. A. Avasthy, Systematic study on agile software metrics. (IJCSIT) Int. J. Comput. Sci. Inform. Technol. 8(5), 552–555 (2017). ISSN: 0975-9646 7. A. Freire, M. Perkusichb, R. Saraivaa, H. Almeidaa, A. Perkusich, A Bayesian networks-based approach to assess and improve the teamwork quality of agile teams. Inf. Softw. Technol. (2018)
266
K. Chakravarty and J. Singh
8. S.M.A. Shah, E. Papatheocharous, J. Nyfjord, Measuring Productivity in Agile Software Development Process: A Scoping Study, ICSSP’15, 24–26 Aug 2015 (Tallinn, Estonia c, ACM, 2015) 9. K.V. Jeeva Padmini, H.M.N. Dilum Bandara, I. Perera, Use of Software Metrics in Agile Software Development Process (IEEE, 2015) 10. S. Abdalhamid, A.O.M. Mohammed, A. Mishra, Agile and Quality: A Systematic Mapping Study (IEEE, 2019) 11. E.N. Budacu, P. Pocatilu, Real time agile metrics for measuring team performance. eInformatica Economica 22(4) (2018)
Optimization of Ray-Tracing Algorithm for Simulation of PMD Sensors Sangita Lade, Purva Kulkarni, Prasad Saraf, Purva Nartam, and Aniket Patil
Abstract PMD sensors are based on time of flight (TOF) technology which deliver speedy images with depth data to suit real-time applications. Simulation of ray tracing for real-time image generation is an important step for testing photonic mixer devices (PMD) sensors. These ray-tracing applications for image simulation are embarrassingly parallel in nature. Hence, to speed up this task, GPUs embody the perfect hardware architecture required. This paper proposes various optimization techniques for a ray-tracing algorithm using NVIDIA CUDA-enabled GPUs. The ray-tracing tasks are subjected to per pixel of the PMD sensor. Even simple parallel algorithms of ray tracing have considerable scope of optimization with respect to memory, instructions, control flow and concurrency. In this paper, various optimization approaches are proposed which are generic across all CUDA GPUs along with the merits and demerits. The optimizations are carried out for ray-tracing algorithm for simulation of PMD sensors using NVIDIA GeForce 840 M GPU of the Maxwell architecture family. The speed-up obtained from optimized parallel algorithm is 2.29 times of its serial algorithm and is 1.048 times of its parallel algorithm. Keywords Ray-tracing · Simulation of PMD sensors · Compute unified device architecture (CUDA) · Graphics processing unit (GPU) · Optimization · Concurrent kernels · CUDA streams S. Lade (B) · P. Kulkarni · P. Saraf · P. Nartam Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] P. Kulkarni e-mail: [email protected] P. Saraf e-mail: [email protected] P. Nartam e-mail: [email protected] A. Patil IFM Engineering Private Ltd, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_27
267
268
S. Lade et al.
1 Introduction Today, 3D imaging is preferred over 2D due to its capability to create an illusion of depth in an image [1]. Acquisition of 3D data is very essential in industrial and automotive environment because it is fast and accurate. This requirement is satisfied by the cameras or sensors which are based on the principle of time of flight (TOF) that can produce 3D images without any complex electronic circuitry. Photonic mixer device or the PMD sensors are based on TOF principle which produces 3D images. Simulation of PMD sensors is very important and useful before manufacturing the actual prototype due to the low cost associated with software simulation. Multiple sensor components need to be calibrated against various conditions. PMD sensors must adapt with different industries’ and markets’ demands [1]. The time-of-flight (ToF) camera is a basic component for the range measurement and Photonic Mixer Devices (PMD) uses this principle for constructing a 3-dimensional image [9]. The simulation of ray tracing is feasible to be carried out on parallel computationintensive hardware. Graphical processing unit, hardware designed specifically for parallel processing, is employed using NVIDIA’s compute unified device architecture (CUDA) which is an extension to C/C++ functionalities to enable parallel programming on GPUs. In this paper, the algorithm implementation and readings are carried out with NVIDIA GeForce 840 M GPU. GeForce 840 M is a graphic solution for entry-level gaming, photo and video-editing applications [2]. GeForce 840 M has base and boost clock speed of 1029 and 1124 MHz. It has 2 GB memory size with DDR3 memory-type having a 64-bit memory interface. GeForce 840 M, based out of Maxwell architecture, is capable of performing near-photo-real lightning calculations in real time. Thus, for the simulation of PMD sensors using GPU, NVIDIA’s GeForce 840 M is a suitable candidate. The paper proposes various optimization algorithms for per-pixel ray-tracing simulation of PMD sensors. Following are the main contributions: (1) Four approaches for optimization of ray-tracing algorithm. The first approach uses shared and constant memory of GPUs for optimization. The rest of the three approaches use the good part of the first approach, that is, only constant memory along with the algorithms proposed by the rest three approaches. (2) Only designing an efficient algorithm does not guarantee that the actual execution will be better. In dynamic parallelism kernel approach, a divide and conquer algorithm is proposed which is expected to perform better than other approaches. But the results obtained are the worst compared to any other approach in this paper. (3) The work done in this paper conveys that it is important to design algorithms according to the system which is to be programmed.
Optimization of Ray-Tracing Algorithm …
269
1.1 Related Works Optimizing ray-tracing algorithm can be done in two approaches, viz., software and hardware. In [3] a lot of importance is given to the spatial data structures which can significantly reduce the redundant ray–object intersection checking calculations. A spherical bounding volume hierarchy structure is used to perform ray tracing to generate 12 frames per second for 35 triangles. The paper proposes an algorithmicbased optimization approach to reduce the comparison and computation of floatingpoint numbers. Popov et al. [4] have proposed another approach using a spatial data structure called as stackless kd-tree traversal algorithm which makes use of the concept of ropes or links connecting the neighboring leaf nodes of the tree. In paper [5], the Maik Keller proposes a simulation framework for TOF-based sensors using GPUs. The major focus of the paper is on manipulation of camera parameters and the generation of synthetic sensors. The framework provides a facility to add user-defined parameters to be integrated with sensor characteristics. The optimization technique in [6] for ray-tracing highlights the speed-up from CPU to GPU implementations with respect to factors such as modes of compiler, thread numbers, resolution and data bandwidth which limits in optimization techniques for parallel implementation with CUDA on GPUs. In paper [10], optimization is achieved by changing pixel travel priority and ray of light to thread, dedicating depth function to empty threads, and using optimized functions from MSDN library. Also comparison of implementation in different compiler mode, changing thread number, examining different resolution, and investigating data bandwidth is performed. TOF-based PMD sensors are capable of capturing three-dimensional images in a 2D matrix at a high frame rate. The paper discusses about working of PMD sensors based on TOF principle along with the advantages such as registered depth and intensity data at a high frame rate, compact design and reduced power consumption. Thus, the paper shows how PMD sensors are very useful in outdoor application as the camera performs robustly in heavy ambient light condition.
1.2 PMD Technology There exist many imaging technologies that give perception of depth to images. Methods such as triangulation, stereo vision and structured light are prominent examples of measuring distance between the object and the sensor. TOF is one such technology used by PMD sensors which capture images in real time. TOF-based sensors calculate the distance or depth by measuring the time it takes for an infrared light to return to the sensor from the time of emission [1]. Every PMD sensor consists of chip in standard CMOS technology. The key component of the PMD sensors is an array or a line sensor which measures the
270
S. Lade et al.
Fig. 1 TOF method for depth calculation
distance of the target object for every pixel of the sensor in parallel without scanning. The PMD sensors are made up of smart pixels called as photonic mixer device (PMD). The smart pixels are capable of optical sensing in a fast manner. Also, the incoherent light signal demodulation within one component is done by the smart pixels [1] (Fig. 1).
2 Parallel Ray-Tracing Algorithm The ray-tracing algorithm is an important step in the simulation of PMD sensors. It is embarrassingly parallel in nature, so the parallel algorithm for the same is given as follows. 1. Create M × N hardware threads to compute ray tracing for all the rays in the matrix in parallel. 2. Allocate memory for inputs on GPU. i. ii. iii. iv.
RayMatrix SensorGrid ObjectList SensorSettings
3. Allocate memory for outputs on GPU: i. ii. iii. iv. v.
Distance Matrix [M × N], Intersection Matrix [M × N], Reflectivity Matrix [M × N], Visibility Matrix [M × N], Normal Matrix [M × N].
Optimization of Ray-Tracing Algorithm …
271
4. Each thread shall run a kernel in SIMT (Single Instruction Multiple Thread) manner. 5. Kernel: i.
Compute the Ray Index of the thread in execution. a. row = blockIdx.x * blockDim.x + threadIdx.x b. col = blockIdx.y * blockDim.y + threadIdx.y c. rayIndex = row*gridDim.x + col ii. Get the ray properties viz—ray origin and direction from Ray Matrix. iii. Intersection[row, col] = indRaySceneIntersection() iv. Intersection Matrix[row, col]: Fetch the nearest Ray-Object intersection Point. v. Distance[row, col]: Euclidean distance between the SensorGrid and the intersection point. vi. Reflectivity[row, col]: Reflectivity at the intersection point. vii. Normal[row, col]: Normal at the intersection point. viii. Visibility[row, col]: CheckVisibility(intersectionPoint). 6. Copy the output matrices from GPU to CPU. 7. Save the output.
3 Optimization The simulation of the PMD sensor grid consists of 224 × 172 pixels. A ray is sent out into the scene from each pixel from which the distance of the object is calculated from sensor grid in parallel. In the CUDA kernel each ray emerging from the pixel is represented as a thread in the grid of the GPU executing in parallel with other threads. The hardware specification of the machine on which the programs are implemented and the results are recorded: CPU
GPU
• Intel Core i7-4510U with clock-speed—2.00 GHz (boost clock 2.60 GHz)
• NVIDIA GeForce 840 M with 384 CUDA cores and compute capability—5.0
All the techniques considered in this paper have been simulated for a total of 50 objects (20 triangles, 20 spheres and 10 boxes). The programs are implemented in Windows 10 operating system on Visual Studio 2015 IDE in Debug mode (Fig. 2). Asynchronous Data Transfers The transfer of data from host to device (H2D) and device to host (D2H) also accounts time consumption in the total execution time. The data transfers must be packed together so that asynchronous data transfer is achieved. The highest bandwidth for
272
S. Lade et al.
Fig. 2 In-depth Maxwell hardware architecture (GM204 GPU) [7]
memory transfers between host and device is achieved using page-locked memory. In all approaches, SensorGrid and output matrices allocated page-locked memories using cudaHostAlloc() functions. GPUs capability to support asynchronous data transfer depends on the count of asynchronous engines. The call to cudaMemcpyAsync() works with page-locked memory of host when called, may return before the copy is complete as it is asynchronous with respect to the host. On implementation, there has been considerable speed-up observed (Fig. 3). Runtime API calls such as cudaMemcpyAsync() are issued one after another. However, the actual copying of memory starts with cudaStreamSynchronize(). This implementation of memory copies in such a fashion has a role in the overall application execution time reduction and not the ray-tracing algorithm optimization.
Optimization of Ray-Tracing Algorithm …
273
Fig. 3 Memory transfer profiling
3.1 Memory Optimization 3.1.1
Using Shared and Constant Memory
Right from the Fermi architecture, shared memory has been introduced and in its subsequent architectures as well. Each SM has 64 KB of on-chip memory that can be configured as 48 KB of Shared memory with 16 KB of L1 cache or as 16 KB of Shared memory with 48 KB of L1 cache [8]. This is also applicable to Maxwell architecture. Optimization with data stored during the process in the memory can be a predominant factor to gain speed-up. The goal is to reduce memory latency and redundant reads. The global memory is off the chip, hence requires more time for reading to and from the GPU. Shared memory and constant memory are on-chip; hence, using them reduces turnaround time to read and write from memory. The index generation by default accesses global memory for every memory access. Therefore, memory-intensive applications require memory usage optimized. SensorGrid is a matrix which holds the position of the pixel; its value is cached in shared memory because it is referred more than two times. Conversely, the five output matrices are not cached as they are accessed only once. It is expected to rather have a slight reduction in the kernel execution time. However, execution time observed is 610.57 ms, that is, increase by 50–60 ms. Threads in a warp of the kernel do not access consecutive memory locations, which increases the memory traffic. Caching global memory is immaterial as computation on GPU is faster than accessing global memory. The ray-tracing kernel checks whether all the objects intersect with the ray originating from each pixel. Thus, the ray-tracing kernel is a function of input objects present in the scene. Instead of storing input objects’ data in global memory, they are stored in constant memory, a read-only memory. This is because objects’ coordinates are not updated during the ray-tracing kernel execution. The kernel execution time is reduced to 523.96 ms, that is, a decrease by 20–30 ms. Asynchronous memory, a copy to constant memory, records an additional approx. 5 ms speed-up. The reduction in time is less as compared to the increase in time in the case of shared memory. The execution time varying the block size is negligible. The use of shared memory in this ray-tracing algorithm is inefficient as there is no requirement. SensorGrid
274
S. Lade et al.
Fig. 4 Execution time using constant memory and shared memory in combination
matrix cached in shared memory with block size of 256 requires higher execution time which causes register spilling (Fig. 4).
3.2 Instruction and Control Flow Optimization In the kernel, the possibilities of warp divergence are minimized by removing instances of branch conditions on global thread indices. Conditions are replaced with respect to warps rather than thread indices, that is, multiple of 32. Arithmetic instructions like multiplication and division in the power of 2 are replaced with bitwise instructions wherever necessary. However, there is almost no change in execution time as there are huge number of threads (approx. 38,000 threads).
3.3 Kernel Construction Algorithms The PMD sensor grid consists of 224 × 172 pixels which can be interpreted as big tile of rectangular shape or 224 tapes, each consisting of 172 pixels array lined up horizontally. Based on the analogy mentioned, three methods are proposed.
Optimization of Ray-Tracing Algorithm …
3.3.1
275
Concurrent Kernel Approach
CUDA allows multiple kernels to execute concurrently on different streams. In this approach, concurrent kernels perform ray tracing on a limited set of objects in the input. Hence, this approach is dependent on the count of input objects to the kernel. Each kernel performing ray tracing is independent of other kernel. Each kernel is unaware of other kernels’ existence. But if a kernel is unaware about all the objects present in the scene, then the set of objects traced by a particular kernel cannot cast their shadow on the other objects in the scene and vice versa. Thus, every kernel has the knowledge of all objects irrespective of whether to be traced or not by that particular kernel. While kernels execute concurrently, two or more threads of different kernels can race to update the same global memory location. Hence, updating the distance matrix in global memory is done atomically which implies updating of information in four matrices, viz., intersection matrix, normal matrix, reflectivity matrix and visibility matrix. As global memory is cached, this update of memory causes data value inconsistency among different thread blocks. CUDA_threadfence() function achieves consistency by stalling current thread until its writes to global memory are visible to all other threads. The count of kernels launched depends on the number of objects where step count of objects is given to kernel. If all objects of a particular shape are exhausted, then no objects of that type are traced by newly launched kernel. Below is the algorithm of the mentioned approach. Each kernel does ray tracing for a complete grid of the sensor, that is, 172 × 224 pixels. 1. Consider array obj_counter (3) stores the count of objects of each type being scheduled on a CUDA kernel for Ray-Tracing. Index 0, 1 and 2 corresponds to the count of triangle, box and sphere. 2. Step:= 3 //say 3. Obj_counter[i] = 0 for all 0 0.3 and mid_x < 0.7, then it can be concluded that the object is too close from the particular person, else it is situated at a safer distance. With this logic, relative distance of the object from a particular person can be calculated once the object is detected. If the object is too close then a signal or a warning is issued to the person through the voice generation module.
4.5 Voice Generation Module After the detection of an object [10], it is utmost important to acknowledge the person about the presence of that object on his/her way. For the voice generation module PYTTSX3 plays an important role. Pyttsx3 is a conversion library in Python which converts text into speech. This library works well with both Python 2 and 3. To get reference to a pyttsx.Engine instance, a factory function called pyttsx.init() is invoked by an application. Pyttsx3 is a tool which converts text to speech easily. This algorithm works whenever an object is being detected and approximate distance is being calculated. With the help of cv2 library and cv2.putText() functions the texts are getting displayed on to the screen. To identify the hidden text in an image, we use Python-tesseract for character recognition. OCR detects the text content on images and encodes it in the form which is easily understood by the computer. This text detection is done by scanning and analysis of the image. Thus, the text embedded in images are recognized and “read” using Python-tesseract. Further these texts are pointed to a pyttsx.Engine instance, a factory function called as pyttsx.init() is invoked by an application. During construction, a pyttsx.driver.DriverProxy object is initialized by engine which is responsible for loading a speech engine driver from the pyttsx.drivers module. After construction, an object created by an engine is used by the application to register and unregister event callbacks; produce and stop speech; get and set speech engine properties; and start and stop event loops.
290
R. Jadhav et al.
4.6 Testing Third Party App. A third party app provides ease and freedom in the field of app development. It brings efficiency and also helps in fast delivery of the output. This strategy helps in the development of good and quality software. The whole detection procedure can be analyzed and proceeded as follows: (1) The system is established where an android application will capture real-time frames from its range of vision. All the real-time images which get captured by the rear camera are first transferred to the third party application in the mobile phone and then those images are sent to a laptop-based networked server where all the computation takes place. (2) The laptop-based server will be using a pre-trained SSD algorithm for detecting models trained on COCO datasets. It will then pass inferences of weights on the frames to test the input data and the output class will get detected with accuracy metrics. (3) The prescribed model reached an accuracy of 98% for classes like bed and remote and 99% for cup. (4) Once the output prediction is achieved on the laptop-based system, then calculation of distance comes into picture. It is being calculated with the logic discussed above in Eq. (4). An approximation value is being displayed onto the screen. (5) Then comes the role of conversion of object class along with distance into voice modules with the help of some built-in libraries in python like pyttsx3. It is being calculated and then translated into voice modules and sent to the blind person with the help of wireless audio support tools.
5 Results 5.1 Cup See Fig. 5.
Real-Time Object Detection for Visually Challenged
Fig. 5 Object cup with accuracy 99%
5.2 Remote See Fig. 6.
Fig. 6 Object remote with 98% accuracy
291
292
R. Jadhav et al.
Fig. 7 Object bed with accuracy 96%
5.3 Bed See Fig. 7.
5.4 Chair See Fig. 8.
5.5 Tv See Fig. 9. The proposed research-based methodology was able to distinguish among different varieties of objects in real time with a diverse range of accuracies. At times, it depends upon the configurations of the system on which the experiment is being conducted. It starts initially with null value and once an object is detected which means if the prediction score is above 50% it starts recognizing mechanism. Followed by this, a sustainable stable moment is achieved where it outperforms all of its previous accuracies.
Real-Time Object Detection for Visually Challenged
293
Fig. 8 Object chair with accuracy 96%
Fig. 9 Object TV with accuracy 96%
5.6 Accuracy of the Objects Detected Above
S. No.
Objects
Accuracy (%)
Final distance (Units)
1
Cup
99
0.2
2
Remote
98
0.8 (continued)
294
R. Jadhav et al.
(continued) S. No.
Objects
Accuracy (%)
3
Bed
98
Final distance (Units) 0.9
4
Chair
96
0.7
5
Television
96
0.8
5.7 Console View The proposed system successfully detects 90 objects, labels them and also shows its accuracy. The model also calculates the distance from the object to the camera and gives a voice feedback when the person with the camera is approaching the object. The dataset was trained on two different models, SSD Mobilenet V1 and SSD Inception V2. However, the SSD Mobilenet V1 model showed less latency and was faster in detecting objects (Fig. 10).
6 Conclusion The proposed system is an initiative to solve the problems of visually impaired people. Many devices like ultrasonic sensors and the traditional white cane with sensors are currently being used to aid visually impaired people. The proposed system eliminates many limitations of the old systems. The system gets the real-time image from an application in the mobile and sends it to the model on the laptop which then detects the object, calculates the distance between the person and the object. The system gives the audio feedback and also gives a warning when the person is near the object. Thus, this system makes the person self-reliable. As a future scope the system can be integrated into one application. The blind person’s essential objects can be pinned which will make the search more easier for the blind.
Real-Time Object Detection for Visually Challenged
295
Fig. 10 Console view
References 1. K. O’Shea, R. Nash, An Introduction to Convolutional Neural Networks (2015) 2. Nikolas Adaloglou (2020) Intuitive Explanation of Skip Connections in Deep Learning https:// theaisummer.com/skip-connections/ Accessed 30 April 2020 3. Zeiler, Matthew D., and Rob Fergus. “Visualizing and understanding convolutional networks.” In European conference on computer vision, pp. 818–833. Springer, Cham, 2014. 4. https://www.tensorflow.org/datasets/catalog/coco 5. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu: “SSD: Single Shot MultiBox Detector”, 2016; arXiv:1512.02325. 6. https://developers.arcgis.com/python/guide/how-ssd-works/ 7. Khan, Riaz & Zhang, Xiaosong & Kumar, Rajesh & Opoku Aboagye, Emelia. (2018). Evaluating the Performance of ResNet Model Based on Image Recognition. https://doi.org/10.1145/ 3194452.3194461. 8. Howard, Andrew & Zhu, Menglong & Chen, Bo & Kalenichenko, Dmitry & Wang, Weijun & Weyand, Tobias & Andreetto, Marco & Adam, Hartwig. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 9. Howard Jeremy. Lesson 9: Deep Learning Part 2 2018 - Multi-object detection. https://docs. fast.ai/vision.models.unet.html#Dynamic-U-Net. Accessed 30 April 2020 10. K. Matusiak, P. Skulimowski, P. Strurniłło, Object recognition in a mobile phone application for visually impaired users, in 6th International Conference on Human System Interactions (HSI) (Sopot, 2013), pp. 479–484. https://doi.org/10.1109/HSI.2013.6577868 11. P.M. Mather, M. Koch, Computer Processing of Remotely-Sensed Images: An Introduction, 4th edn. (2010) 12. A. Sethi, Build your Own Object Detection Model using TensorFlow API (2020), https:// www.analyticsvidhya.com/blog/2020/04/build-your-own-object-detection-model-using-ten sorflow-api/#:~:text=of%20object%20detection.-,TensorFlow%20Object%20Detection% 20API,refer%20to%20as%20Model%20Zoo. Accessed 30 Apr 2020
296
R. Jadhav et al.
13. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. Berg, SSD: Single Shot MultiBox Detector, vol. 9905, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2 14. N. Kehtarnavaz, M. Gamadia, Real-Time Image and Video Processing: From Research to Reality (2006). https://doi.org/10.2200/S00021ED1V01Y200604IVM005
Reinforcement Learning: A Survey Deepali J. Joshi, Ishaan Kale, Sadanand Gandewar, Omkar Korate, Divya Patwari, and Shivkumar Patil
Abstract Reinforcement learning is one of the fields of study in machine learning. Reinforcement learning is ready to improve the work which is being done in the AI domain and represents a step toward building autonomous frameworks with a more impressive level of understanding of the visual world. Reinforcement learning is about taking appropriate measures to maximize reward in a particular situation. Behavioral psychology has a major influence on reinforcement learning. The main objective is to observe the behavior of an agent’s interaction with the given environment in order to increase the value of a reward. This survey will focus on the basics of reinforcement learning and was intended to help in the understanding of reinforcement learning. The survey starts with giving an introduction to the general field of reinforcement learning, how it is different from different machine learning paradigms and the types of reinforcement learning. Our survey will also cover the challenges which come with reinforcement learning. Keywords Reinforcement learning · Machine learning · Agents · AI
D. J. Joshi (B) · I. Kale · S. Gandewar · O. Korate · D. Patwari · S. Patil Department of Information Technology, Vishwakarma Institute of Technology, Pune 411037, India e-mail: [email protected] I. Kale e-mail: [email protected] S. Gandewar e-mail: [email protected] O. Korate e-mail: [email protected] D. Patwari e-mail: [email protected] S. Patil e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_29
297
298
D. J. Joshi et al.
1 Introduction The idea that we learn by interfacing with our environment is most likely the first to happen to us when we consider the behavior of learning. At the point when a newborn child plays, waves its arms, or looks about, it has no direct instructor; yet at the same time, it figures out how to play out the referenced exercises as it has an immediate association with its environment. Thus, it learns by sensing and interacting with the environment. This is the approach that is explored in reinforcement learning. It is significantly more centered around goal-directed gaining from communication than are different ways to deal with machine learning. Learning from communicating with the environment is a fundamental idea behind nearly all theories of learning and intelligence which are expanded by the means of reinforcement learning [1]. Reinforcement learning is technically the science of constructing optimal decisions. It helps us formulate reward-motivated behavior exhibited by living species. The essential idea of reinforcement learning is just to catch the foremost vital sides of the real issue confronting a learning agent cooperating with its environment to accomplish an objective. Clearly, such an agent must have the ability to detect the condition of the environment to some extent and have the option to make moves according to the state of the environment. The agent also must have a goal or goals referring to the state of the environment. The definition is intended to consolidate only these three perspectives—sensation, action, and goal—in their most straightforward potential structures without trivializing any of them. Reinforcement learning is the training of machine learning models so that they can make a sequence of decisions. The agent figures out how to accomplish a goal in an unsure, possibly complex environment. In reinforcement learning, the agent faces a game-like circumstance. The agent utilizes experimentation to return up with a response to the issue. To make the machine attempt to do what the programmer needs, the agent gets either rewards or penalties for the activities it performs. Its goal is likely to expand the full reward. Figure 1 depicts a simple scenario of reinforcement learning. Some important
Fig. 1 Reinforcement learning environment
Reinforcement Learning: A Survey
299
terminologies which illustrate the components of a reinforcement learning scenario are: 1. 2. 3. 4. 5.
Environment: This is the physical world in which the agent operates. State: It describes or tells us about the current scenario of the agent. Reward: It is the feedback or response given by the environment to the agent. Policy: It is a way to correlate an agent’s state with its actions. Value: It is the benefit or reward which the agent will receive in the future by performing a specific action in a specific state. 6. Agent: The agent constructs all the decisions according to the punishments and rewards of every action or a set of actions. Even though the programmer decides upon a reward policy, that is, the principles of the model, he/she does not provide the model any clues so that it can find a way to work out the problem. It is entirely the model’s job to discover ways to perform the task to maximize the benefits. The model uses its capabilities of searching and conducting plenty of trials, thus making reinforcement learning one of the most efficient methods to demonstrate the creative capabilities of a machine. Unlike humans, AI is capable of gathering relevant experience from various different parallel trials if the AI is supported with the right and robust infrastructure. One of the primary goals of the field of machine learning is to develop completely autonomous agents which will be able to interact intelligently with the environments to learn the most useful or appropriate behaviors, improving over time through trial and error which can be fulfilled with the use of reinforcement learning. Reinforcement learning could not take off in the past as it was limited by the insufficient infrastructure and technology. But, recent developments of a range of Atari 2600 video games at a superhuman level, directly from image pixels [2] shows that progress began to take place through the discoveries of machines with high computational technologies. The second standout success in the field of reinforcement learning was the creation of a deep reinforcement learning system, AlphaGo. It was the first system which was successful in defeating the then GO world champion [3]. This achievement drew parallels with IBM’s unprecedented achievement when their indigenous Deep Blue that defeated human chess champion [4] and also to IBM’s Watson DeepQA system which defeated human players in a game of Jeopardy! [5].
2 Key Features 2.1 How is Reinforcement Learning Different from Supervised Learning [6] and Unsupervised Learning [7] Reinforcement learning differs in the following ways:
300
D. J. Joshi et al.
1. There isn’t any supervisor present which implies that there isn’t any way for the model to get information about the best possible action. There is the concept of rewards for each action performed. 2. Time has significant importance in reinforcement learning as it pays a lot of attention to sequential data unlike the other paradigms which receive random inputs. 3. It has a concept of delayed rewards. You may not get rewards at every step of the task and the reward may be given only after the entire task is completed. 4. The agent’s actions affect the next input it receives. For example, you have the choice of going either left or right. After performing the actions on the input at the next step will depend on your previous choice of left or right.
2.2 Algorithms Used in Reinforcement Learning The following are the different methods used to plan reinforcement learning algorithms: 1. Value-based: In this method we must always attempt to amplify a value function V(s) [8]. During this approach the agent expects the value function to estimate and return a value by following a policy π [8] depending upon the present states of the agent. 2. Policy-based: In this technique we try to create an innovative policy so that the specific steps performed in every state of the agent will aid us to attain the highest possible reward available in the future. This technique can have two approaches, namely deterministic and stochastic [9]. 3. Model-based: In this approach we need to create an effective design virtually for every environment. Thus, the agent needs to master performing tasks in each of the different environments.
3 Types of Reinforcement Learning Reinforcement learning consists of two types: active and passive. The agent’s policy is fixed in passive reinforcement learning, that is, the algorithm has to be told what tasks to perform and at what states. The main aim of a passive reinforcement learning agent is to implement a consistent order of steps or actions and assess them accordingly. However, in the case of active reinforcement learning, the agent will have to determine what actions to perform as there isn’t any consistent policy in place according to which it can work on. Active RL requires its agent to learn the optimal approach on their own.
Reinforcement Learning: A Survey
301
3.1 Passive Learning The objective of the agent is to examine how good is an optimal policy; therefore, the agent needs to learn the expected utility U π (s) for each state “s”. This can be done in the following three ways.
3.1.1
Direct Utility Estimation
The agent executes a sequence of states, actions and transitions that continue until the agent has reached its final state. Utility is assessed by the agent based on the sample values given by each trial. Sample values can be calculated as a running average. The main drawback is that this method falsely states that state utilities are not dependent; in fact they are Markovian. Also, this method intersects. Suppose we have a 4 × 3 grid which will be considered as the environment in which the agent can move either left, right, up or down (set of available actions). An example of a sample run is: (1, 1)−0.04→ (1, 2)−0.04→ (1, 3)−0.04→ (1, 2)−0.04→ (1, 3)−0.04→ (2, 3)−0.04→ (3, 3)−0.04→ (4, 3)+1
Total reward starting at (1, 1) = 0.72.
3.1.2
Adaptive Dynamic Programming (ADP)
ADP is a clever method as compared to direct utility estimation. It runs trials to learn the environmental model by assessing the utility of a state as a reward amount for being in that state, and a discounted discount for being in the next state [10] U π (s) = R(s) + γ
P s |s, π (s) U π (s )
(1)
s
In R(s) = state s, P(s |s, π (s)) = transformation model, γ = discount factor and U (s) = reward for being in state s . This can be solved using an algorithm called value-iteration algorithm. The algorithm is usually fast enough but can be very expensive to calculate for large state space. It requires a model-based approach and a transformational model of the environment. π
3.1.3
Temporal Difference Learning (TD)
There is no need for an agent to learn the transition model of learning TD. The update occurs between successive states, and the affected states need to be changed by the agent.
302
D. J. Joshi et al.
U π (s) ← U π (s) + α(R(s) + γ U π s − U π (s)
(2)
Here R(s) = state s, γ = discount factor and U π (s) = reward for being in state s . alpha (α) is the learning rate that determines the convergence for real utilities. ADP adjusts the utility of s with its successor states, and TD learning aligns it with single successor states. The computation in TD is easier, but it is slow in convergence.
3.2 Active Learning The active reinforcement learning agent changes its policy as it goes and learns. A model of active learning is illustrated below.
3.2.1
ADP with Exploration Function
For the active agent to learn the right policy, he must learn the utility of each state and reform its policy. This can be done using a passive ADP agent and it can learn the correct actions using a value or policy repetition. But this approach turns out to be greedy agents. Therefore, we use a method of assigning low weights to actions with high weights and low utilities for unexplored actions [10]
Ui+1 (s) ← R(s) + γ maxa A f P s |s, a Ui s , N (s, a)
(3)
s
Here, R(s) = state s, P(s |s, π (s)) = transformation model, γ = discount factor and U π (s) = reward for being in state s . f (u, n) =
R + , i f n < Ne u, otherwise
(4)
where f (u, n) is a search function that increases with the value u and decreases with the number of attempts. R+ is an optimistic gift and we want the agent to be forced to choose an action in each state. The exploration or the search function activates the passive agent.
4 Learning Models of Reinforcement There are two main models for reinforcement learning.
Reinforcement Learning: A Survey
303
4.1 Q-Learning Q-learning is a TD learning method that does not require the agent to learn the transformation paradigm, instead learning the Q-value function Q(s, a) [11]. U (s) = maxa Q(s, a)
(5)
Q-values can be updated using the following equation: Q(s, a) ← Q(s, a) + α(R(s) + γ maxa Q s , a − Q(s, a))
(6)
Here R(s) = state s and γ = discount factor. You can select the next action using the following procedure [11]: anext = argmaxa f Q s .a , N s , a
(7)
Again, this is easy to compute but slower than ADP.
4.2 Markov Chain and Markov Process The Markov property states that the longer term depends exclusively on the current and nothing else. The Markov process [12] may be a probabilistic model that only depends on the present state and not the previous ones. Moving from one state to a different one is termed transition and its probability is termed as transition probability. We will consider an example of anything within which the next state depends solely on the current state (Fig. 2). Markov decision process (MDP): MDP is an extension of the Markov chain. It provides a framework based on mathematics for modeling decision-making situations. Most reinforcement learning problems may be modeled as MDP. MDP could be represented by five important elements. 1. A collection of state (S) agents may be in. 2. A collection of actions (A) that may be performed by an agent, for moving from one state to another. 3. A transition probability (Pa1,2 ), which is that the probability of moving from one state to a different state by performing some action. 4. An award probability (Ra1,2 ), which is that the probability of an award acquired by the agent for moving from one state to a different state by performing some action.
304
D. J. Joshi et al.
Fig. 2 Example of Markov chain
5. A reduction factor (γ ), which controls the importance of immediate and future rewards. We are going to discuss this very well within the upcoming sections. Model: A model (sometimes called transition model) gives an action’s effect during a state, especially T (S, a, S ) defines a transition T where being in state S and taking an action “a” takes us to state S (S and S could even be the same). For stochastic actions (noisy, non-deterministic) we also define a probability P(S |S, a) which represents the probability of reaching a state S if action “a” is taken in state S. For stochastic note the Markov property states that the results of an action taken during a state depend only on the state and not on the prior history. Rewards: Based on the action our agent performs, he receives an award. An award is nothing but a numerical value, say, +1 permanently action and −1 for a foul action. An agent tries to maximize the entire amount of rewards (cumulative rewards [13]) he receives from the environment rather than immediate rewards. The entire amount of rewards the agent receives from the environment is named returns. We will formulate the entire amount of reward as R(t) = r (t + 1) + r (t + 2) + r (t + 3) + r (t + 4) . . . + r (T) [14]
(8)
Reinforcement Learning: A Survey
305
where r(t + 1) is the reward received by the agent at a time step t0 and then on. Episodic and continuous tasks: Episodic tasks are the tasks that have a terminal state (end). For instance, in an auto racing game, the top of the game may be a terminal state. Once the sport is over, you begin the subsequent episode by restarting the sport which can be an entire new beginning. Within the above case r(T ) is the terminal state and also the end of episode. In continuous tasks there is no terminal state. For example, a personal assistant doesn’t have a terminal state. Discount factor: Since we don’t have any final state for a continual task, we will define our return for continuous tasks as R(t) = r(t + 1) + r(t + 2) + r(t + 3) … + r(T ) which can sum up to ∞. That’s why we introduce the notion of a reduction factor. We will redefine our return with a reduction factor as follows: R(t) = r (t + 1) + γ r (t + 2) + γ r (t + 3) + . . .
(9)
Here, γ = discount factor and r(t + 1) is the reward received by the agent at a time step t 0 and then on. The discount factor decides what proportion of importance we give to the long-term rewards and immediate rewards. The worth of the discount factor lies within 0–1. Very low discount factor signifies the importance to immediate reward while high discount signifies the importance to future reward. The actual value of the discount factor is application dependent but the optimal value of the discount factor lies between 0.2 and 0.8. The policy function: As we studied within the previous article, it is a function which maps states to actions. It is denoted by π. So, basically, a policy function says what action to perform in each state. The final goal lies in finding the optimal policy which gives the right action to be performed in each state, which maximizes the reward. State value function: State value function [13] specifies how benefitting it is for an agent to be during a particular state with a policy π. A price function is usually denoted by V (s). It denotes the worth of a state following a policy. The state value function depends on the policy and it varies betting on the policy we elect. State value function can be denoted as
∞ γ k rt+k+1 |st = s (10) V π (s) = E k=0
Here, γ = discount factor and r(t + 1) is the reward received. V is the value of a state s.
306
D. J. Joshi et al.
Table 1 State value table
Table 2 Q table
State
Value
State 1
0.3
State 2
0.9
State
Action
Value
State 1
Action 1
0.03
State 1
Action 2
0.02
State 2
Action 1
0.5
State 2
Action 2
0.9
We can view value functions in a table. The greater the value, the better the state is (Table 1): Here state 2 is good. State-action value function or Q function: A state-action value function is additionally called the Q function. It specifies how benefitting it is for an agent to perform a specific action during a state with a policy π. The Q function [14] is denoted by Q(s). It denotes the importance of taking an action during a state following a policy π. We will define Q function as follows: Q π (s, a) = E π
∞ k=0
γ k rt+k+1 |st = s, at = a
(11)
Here, Q(s, a) is the worth of function s, γ = discount factor. Q function: Basically, the difference between the worth function and the Q function is that the former specifies the goodness of a state, while the latter specifies the goodness of an action during a state. Just like the state value table, we will make a Q table which shows the worth of all possible state action pairs. Whenever we are saying value function V (S) or Q function Q(S, a), it actually means the worth table and Q table (Table 2).
5 Disadvantages of Reinforcement Learning In reinforcement learning the agents take actions in an environment to maximize some predefined reward functions. In supervised learning adequate amount of labeled data is explicitly present with the correct input/output. In reinforcement learning labeled data is not explicitly present with correct input/output; here the learning happens
Reinforcement Learning: A Survey
307
online so that the agent actively interacts with its environment for many times, and he learns which actions should be taken to gain maximum reward. Reinforcement learning lets you assess the described variables in the environment and access the variables at every step. It is often that not only you have access to the partial information, but the information itself can be incorrect and further it would be needed. In reinforcement learning the reward functions are predefined and further calculation of reward must be done for taking action, but the reward functions are unobvious. For example, considering an agent is set to plan a path for a self-driven car, we will be unable to describe the reward in terms of calculus and we will also be unable to believe that the reward function that we have defined is useful. Reinforcement learning allows you to make mistakes, but you cannot afford mistakes all the time if we see previous examples of self-driven cars while testing may crash several times before it can make a simplest skillful move. Still training in computerized real-life conditions has great output in real life and it should not be stopped. As we have gone through the working culture in reinforcement learning, we have seen the learning process is online, so we have to execute the trials several times if we want to produce an effective model. This is useful and acceptable when the work is simple or easy, actions are individually separate and distinct, and the information is already available. But in many of the cases the problem is complex and timeconsuming; therefore, we must balance the correctness of our simulator while considering the training time and the real-time performance constraints. These disadvantages or limitations have led to recent success in reinforcement learning and happened almost in a controlled environment, and there is still a huge research required to overcome these limitations and turning toward deep reinforcement learning to work efficiently in real-time agents [15].
6 Conclusion The key distinctive component of reinforcement learning is the means by which the agent is prepared. Instead of examining the data given, the model interfaces with nature, looking for approaches to amplify the prize. Concerning the instance of deep reinforcement learning, a neural system is entrusted with the putting away of the agent’s encounters and in this manner improving the manner in which the errand is performed. Reinforcement learning is no doubt a forefront innovation that can possibly redo our reality. In any case, it need not be used in every single case. By the by, reinforcement learning is by all accounts the most effective way to deal with making a machine imaginative—as looking for new, inventive approaches to play out its assignments is a form of innovativeness. We can see this previously occurring with DeepMind’s currently celebrated AlphaGo [3] which played moves that were first viewed as glitches by human specialists, yet as a general rule it had the option to accomplish triumph against the hero among human players, Lee Sedol [3]. Along
308
D. J. Joshi et al.
these lines, reinforcement learning can possibly be a weighty innovation and in this manner become the subsequent stage in artificial intelligence improvement. Acknowledgements This project is funded by Department of Science and Technology, New Delhi under SYST Scheme. Ref No: SP/YO/060/2016. The authors are extremely thankful to the authorities for their support.
References 1. W. Qiang, Z. Zhongli, Reinforcement learning model, algorithms and its application, in 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC) (Jilin, 2011), pp. 1143–1146 2. V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015) 3. D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016) 4. M. Campbell, A. Joseph Hoane, F.-H. Hsu, Deep blue. Artif. Intel. 134(1–2), 57–83 (2002) 5. D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A.A. Kalyanpur, A. Lally, J. William Murdock, E. Nyberg, J. Prager, et al., Building Watson: an overview of the deep QA project. AI Mag. 31(3), 59–79 (2010) 6. A. Singh, N. Thakur, A. Sharma, A review of supervised machine learning algorithms, in 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom) (New Delhi, 2016), pp. 1310–1315 7. G. Vachkov, H. Ishihara, Unsupervised learning algorithms for comparison and analysis of images, in 2008 IEEE International Conference on Mechatronics and Automation (Takamatsu, 2008), pp. 415–420 8. R. Lincoln, S. Galloway, B. Stephen, G. Burt, Comparing policy gradient and value function based reinforcement learning methods in simulated electrical power trade. IEEE Trans. Power Syst. 27(1), 373–380 (2012) 9. R. Tedrake, T.W. Zhang, H.S. Seung, Stochastic policy gradient reinforcement learning on a simple 3D biped, in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 3 (Sendai, 2004), pp. 2849–2854 10. R. Learning, Artificial Intelligence: Foundations of Computational Agents (Cambridge University Press, David Poole and Alan Mackworth, 2010). 11. https://arxiv.org/pdf/cs/9605103.pdf 12. M. Abu Alsheikh, D.T. Hoang, D. Niyato, H. Tan, S. Lin, Markov decision processes with applications in wireless sensor networks: a survey, in IEEE Communications Surveys & Tutorials, vol. 17, no. 3, third quarter (2015), pp. 1239–1267 13. X. Fei, A. Boukerche, R. Yu, An efficient Markov decision process based mobile data gathering protocol for wireless sensor networks, in 2011 IEEE Wireless Communications and Networking Conference (Cancun, Quintana Roo, 2011), pp. 1032–1037 14. C. Courcoubetis, M. Yannakakis, Markov decision processes and regular events. IEEE Trans. Autom. Control 43(10), 1399–1418 (1998) 15. Disadvantages of Reinforcement Learning form kdnuggets.com: https://www.kdnuggets.com/ 2017/12/when-reinforcement-learning-not-used.html
Data Encryption on Cloud Database Using Quantum Computing for Key Distribution Krishna Keerthi Chennam, Rajanikanth Aluvalu, and V. Uma Maheswari
Abstract Data encryption is growing fast and has become famous when they merge the basic schemes into difficult schemes. Various data encryption techniques introduce the distinct methods like big data, networks, and web. The important hindrance forms for accessing are to give security and competency to reflex data encryption methods. The manuscript recommends estimating data encryption framework that merges with quantum cryptography and adds a fine-grained control access policy to our proposed method where the authorization valid for every single entry reaches our aim of optimizing the performance on sharing secret key for data encryption or data decryption. Finally, our proposed method gives secure data sharing with heterogeneous cloud service providers. Keywords Privacy preserving · Data encryption · Quantum computing · Cryptography · Fine-grained · Cloud database
1 Introduction Recently, cloud computing and cloud database have become emerging topics in past and future decades. Both are transposing the way we use and consider the productivity with few areas. Now, because of restrictions on the usage of storages and access, we try to adopt various databases in cloud service providers. Various organizations are opting for this option because of flexibility, no maintenance, no storage location, no equipment and low cost. The cloud is providing open and easy storage platform K. K. Chennam (B) MJCET, Hyderabad, India e-mail: [email protected] R. Aluvalu · V. Uma Maheswari Vardhaman College of Engineering, Hyderabad, India e-mail: [email protected] V. Uma Maheswari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_30
309
310
K. K. Chennam et al.
for personal as well as for organizations of various cloud service providers. When number of users are increasing, security problems started from malicious users, administrators and cloud providers. Because of this, the importance of security has increased for storing our data in cloud database by any cloud service provider. From [1–3], we have various methods to maintain the privacy of the data stored in cloud. As mentioned earlier, those methods are based on the security problems of a single data owner. However, in few applications, the problems arise because of multiple data owners are interested to share data more securely when they are in a group or sending to a group of users in a secured way. Simple encryption is not sufficient to store data on cloud. So secret key can be used for data encryption based on the problems. We discuss different key sharing methods and our proposed model is one of the best ones to share secret keys between the user and the organization, or between two users, or between group of users who want to retrieve the same data. A key agreement protocol is used to produce a basic conference key for various participants to confirm the security of the future problems, and this protocol is used in cloud computing to give assistance for security and capable of improving the data security. Hence, the Diffie-Hellman protocol [4], the key agreement protocol, is an important protocol for cryptographic problems. The basic Diffie-Hellman protocol gives a better solution to the problem of producing a common secret key between two parties to exchange the key for encryption and decryption of data when they are sharing to cloud database. In cryptography [5] shows the various options to produce keys. However, it is not an option for any authentication services, which is vulnerable to man-in-the-middle attacks. This can be transmitted by providing basic authentication methods to the protocol as shown in [6]. The Diffie-Hellman keys are accepted between the two parties. Similarly, delay or crushing the communications between two participants. There is another mechanism which is more popular in security compared to all encryption techniques called quantum mechanics, also called as quantum cryptography. The characteristics of the quantum computer are more than the existing public key encryptions such as RSA [7, 8] and elliptic curve cryptography (ECC) [9, 10]. In the universe, quantum mechanics is made up of tiny particles like atoms. Classical mechanics did not succeed in tiny cases, and the fact is that it failed. At this point of time, quantum particles are identified. The Heisenberg’s principle strictly states the position and less defined the momentum, and vice versa, at this point of time. At certain moment, to identify the position of an electron which is rotating around the nuclear atom, where we cannot identify the electrons velocity, we may not identify the position accurately. Practically, this can be implemented on photos where photons are a light which is small in measure. The photon spins practically in various directions like horizontal, vertical or diagonal at a time similar like east, west, north, south, up and down at a time. Quantum cryptography is immature as of now. But we cannot neglect the objections that carry us to the current situation in the network space. Integer factorization problem and discrete logarithm problems works effectively elucidate in given polynomial time which is proposed by Shor [11]. As we know clearly that cryptography and security in network are the important mechanisms to assure the data system
Data Encryption on Cloud Database …
311
with proper security [12]. Quantum cryptography is a very crucial work in cryptography, which is a combination of quantum cryptography and classical cryptography. The final ambition of the investigation of quantum cryptography is to implement the cryptographic algorithms and protocols which are faced by the quantum attacks. As discussed earlier, analyzing the quantum cryptography protocols which are mandatory parts of security for the upcoming networks issues.
2 Literature Survey 2.1 Quantum Key Distribution (QKD) Since discovery of quantum cryptography, the QKD is popular with widespread technical interest. QKD transmits these requirements by applying the quantum properties to swap private data like secret keys which are required to encrypt the data before storing on cloud database provided by the cloud service provider. The security of QKD depends on basic laws which are unbeatable to improve the computational power or quantum computers. QKD is most effective to face the challenges from classical encryption techniques and building methods for secure areas for vast security or quantum computers. QKD gives high security for eavesdropping. The working procedure of QKD is related to basic properties of quantum mechanics. The eavesdropper demands to block a quantum swap and then identifies the users regarding this. The legal legitimate user reduces the eavesdroppers in their way while receiving secret keys or any secret data from other user or from organization as well. A QKD implantation includes the basic requirements as follows: • A free space quantum channel to transfer states of light between the transmitter (Alice) and receiver (Bob). This channel no need to be secured. • Public but authenticated communication link from both parties to perform postprocessing steps and a correct and secret key. • A key exchange protocol that abuses the properties to assure security by identifying the eavesdroppers or problems by analyzing the amount of data that has been intercepted (Fig. 1).
2.2 Elliptic Curve Cryptography (ECC) ECC [7, 8, 13] is a public key encryption technique designed with the base of elliptic curve theory which helps to identify or create faster, smaller and more efficient keys. ECC generates through properties from elliptic curve equation and generates points from the curve. The generated points from curve are prime numbers which help to generate the keys. The basic example of elliptic curve is shown in Fig. 2.
312
K. K. Chennam et al.
Fig. 1 Flowchart of stages of QKD
Fig. 2 Elliptic curve
3 Problems Identified Storing the data on the cloud is the basic standard for many organizations because of less cost, large-scale storage of data on cloud service provider. Against of many advantages, from the cloud service providers are subject to untrusted environment and required more security and privacy to solve the problems of storing data on cloud. The most important issues are in authentication of users within the organization and public user from outside organization, accommodating data confidentiality from the vicious administrator behaviors. Based on the concerns of admins we can view the data of regular users who are seeking for the data security on cloud. Data privacy is by combining the encryption techniques before transforming the data to the cloud by any cloud service providers. Where the admins can delete or alter or insert the data or copy the data from the database without knowledge of the data owner which reduces the data integrity. Accommodate the administrator by creating a new account as a
Data Encryption on Cloud Database …
313
regular user in the database by various ways. The advantages are storing the data in cloud, and administration is maintained by third party cloud service provider. The disadvantages are the data owner loses his grip on the data where the admin have full powers to control the data and hence possibility of attacks by malicious administrator or outsider.
4 Proposed Model Storing and converting unstable data on cloud database boost the exposure of pirated access to a single user or organizational access when the administrator or other user eavesdrops. This work proposes the data encryption on cloud using quantum computing for key generation which provides the confidentiality on data. The suggested model includes two components, one is generating secret key using quantum cryptography instead of random key generator or ECC or Diffie-Hellman or other methods. Secondly, the data is encrypted using the secret key before storing on cloud database which is provided by any cloud service provider. • Initially, the users register him/her in the organization and ensure the authentication with the proxy server and requests for the key to encrypt the data or processing queries. • By using the QKD the secret keys are generated and shared with the registered user after checking the authorization of the user using access control policy. • Lastly, the data owner enables the trusted proxy server to apply the AES on the data records before storing on the cloud database. In the end, the encrypted queries identify the encrypted data from the cloud and decrypt the data using secret keys, where the high confidentiality is improved based on the QKD by sending keys with high security to the legitimate users and identifies the unauthorized users with the access control policy. The encrypted data is stored in the cloud database provided by any cloud service provider.
4.1 Security Model By considering the pharma company (P) contains set of users (Ur), patients (Pa), researchers (Rr) and doctors (Dr) where Ur = {Pa, Rr, and Dr}, Fig. 3 presents the proposed model of data encryption on cloud database using QKD. The authentication is verified when the user’s login and if authentication fails then the user is rejected. Once the user logged in to the company or organization, the key pairs are generated using QKP protocol which helps to encrypt the data or encrypt the queries or retrieve the data from cloud database. Once the key pairs are generated then the organizations separate the users from personal domain and public users [14]. Based on the domain the access on data will be given to the user based on the confidentiality of the data
314
K. K. Chennam et al.
Fig. 3 Proposed model
provided by the data owner. The encrypted data is stored on cloud database, and the encrypted query retrieve the data from the cloud database in the form of encrypted data only and send to the user. At the user end, the encrypted data is decrypted with the help of key pairs generated from QKD. The proposed model has different components like attribute license (ALs), data owner (owner), proxy (PS) and cloud database (CDB). The components can connect with one another either in direct or indirect ways to complete the tasks in cloud database. The owner provides access policies on the attributes (AT). At the time of sign up process, the trusted AL generates using the AT and CDB. The AL is based on the AT and PS to give access for information with the respective users. A trusted PS helps in information encryption before sending the data to the cloud database provided by cloud service provider. The information is stored on the cloud database where all AT values of the user have some unique key, that is search key, to retrieve data from cloud database. An index appears on every search key value of any frequently accessed users, and it is shuffled within its corresponding block. It ensures the data confidentiality as well as fast querying. Consider a system contains K number of users and N number of attributes. The attribute authority derives a set of attribute IDs and keys (equal to the number of columns {C1, C2, … Cn} containing the basic information of patient). Then the attribute authority uploads the generated attribute IDs {ALid1, ALid2, … ALidn} and secret key {Kid1, Kid2, … Kidn} to the proxy server via SSL channel. During user registration, the user offers some basic information such as user-ID, reason, medicines, age, and contact details to upload to a proxy server via data owner. The proxy server encrypts the health records [15, 16] of the patient using a symmetric encryption method, in which both the encryption of plaintext and decryption of ciphertext exploit the same secret. Finally, it sends the encrypted information and encrypted index to the cloud server (via SSL).
Data Encryption on Cloud Database …
315
5 Results A comparison is performed on time taken to generate secret key pairs between classical encryption methods and QKD used for data encryption to store data in cloud database. Assuming that all the key distribution channels are secure, both classical and quantum, the key distribution time differs based on the key size. Table 1 shows the comparison of key size measured in bits varying in the range 128, 256, 512, 1024 and 2048 bits. The key distribution time is measured in seconds for all the classical and quantum key distribution algorithms. For AES, the key distribution time is 0.059 s and varies up to 0.131 s. For RSA, the key distribution time is 0.063 s and varies up to 0.218 s. For ECC, the key distribution time is more when compared with AES and RSA, that is, 0.8 s and varies up to 1.4 s. For QKD, the key distribution time is very less compared with ECC and slightly differs from AES and RSA, that is, 0.048 s and varies up to 0.189 s. In the graph of Fig. 4, the difference between classical and QKD protocol is shown, with the key distribution time in seconds on y-axis and key size in bits on x-axis. Table 1 Comparison table of key distribution time in seconds Key size (bits)
AES (s)
RSA (s)
ECC (s)
128
0.059
0.063
0.8
0.048
256
0.071
0.096
0.18
0.079
512
0.0.96
0.135
0.27
0.114
1024
0.113
0.16
0.64
0.147
2048
0.131
0.218
1.4
0.189
Fig. 4 Comparison graph of key distribution time in seconds
QKD (s)
316
K. K. Chennam et al.
6 Conclusion From the laws of quantum mechanics, the QKD protocol proves that it is secured, and consumes less key distribution time when compared with classical encryption techniques. The laws of quantum mechanics improve more effectively, securely and quickly based on other techniques. By using this QKD protocol the secured key pairs are generated. Based on the generated key pairs the data is encrypted and stored in cloud database. This paper marks the significance of the QKD protocol to generate secret keys for storing the data in cloud database. Also, the control policy is included in authentication of the user for more security applications. Acknowledgements The author would like to thank the team members who helped to do the research and write this manuscript. Lastly, the author would like to thank the Vardhaman College of Engineering for giving this opportunity to publish our manuscript.
References 1. L. Zhou, V. Varadharajan, M. Hitchens, Cryptographic role based access control for secure cloud data storage systems. Inf. Forensics Sec. IEEE Trans. 10(11), 2381–2395 (2015) 2. F. Chen, T. Xiang, Y. Yang, S.S.M. Chow, Secure cloud storage meets with secure network coding, in IEEE INFOCOM (2014), pp. 673–681 3. D. He, S. Zeadally, L. Wu, Certificate less public auditing scheme for cloud-assisted wireless body area networks. IEEE Syst. J. 1–10 (2015) 4. W. Diffie, M.E. Hellman, New directions in cryptography. IEEE Trans. Inf. Theory 22(6), 644–654 (1976) 5. J. Shen, H. Tan, S. Moh, I. Chung, J. Wang, An efficient rfid authentication protocol providing strong privacy and security. J. Internet Technol. 17(3), 2 (2016) 6. L. Law, A. Menezes, M. Qu, J. Solinas, S. Vanstone, An efficient protocol for authenticated key agreement. Des. Codes Crypt. 28(2), 119–134 (2010) 7. R.L. Rivest, A. Shamir, L. Adleman, A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978) 8. J. Shen, T. Zhou, X. Chen, J. Li, W. Susilo, Anonymous and traceable group data sharing in cloud computing. IEEE Trans. Inf. Forensics Sec. 13(4), 912–925 (2018) 9. T. ElGamal, A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 31(4), 469–472 (1985) 10. Y.-M. Tseng, An efficient two-party identity-based key exchange protocol. Informatica 18(1), 125–136 (2007) 11. P.W. Shor, Algorithms for quantum computation: discrete logarithms and factoring, in Proceedings of the 35th Annual Symposium on Foundations of Computer Science (SFCS ’94) (IEEE, 1994), pp. 124–134 12. J. Shen, T. Zhou, F. Wei, X. Sun, Y. Xiang, Privacy preserving and lightweight key agreement protocol for V2G in the social internet of things. IEEE IoT J. 1–1 13. M. Amara, A. Siad, Elliptic curve cryptography and its applications, in International Workshop on Systems, Signal Processing and their Applications, WOSSPA (Tipaza, 2011), pp. 247–250. https://doi.org/10.1109/WOSSPA.2011.5931464 14. K.K. Chennam, L. Muddana, R.K. Aluvalu, Performance analysis of various encryption algorithms for usage in multistage encryption for securing data in cloud, in 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT) (IEEE, 2017), pp. 2030–2033
Data Encryption on Cloud Database …
317
15. M.A. Jabbar, B.L. Deekshatulu, P. Chandra, Classification of heart disease using artificial neural network and feature subset selection. Global J. Comput. Sci. Technol. Neural Artif. Intel. 13(3), 4–8 (2013) 16. M.A. Jabbar, B.L. Deekshatulu, P. Chandra, Intelligent heart disease prediction system using random forest and evolutionary approach. J. Netw. Innov. Comput. 4, 174–184 (2016)
Prediction and Prevention of Addiction to Social Media Using Machine Learning Maheep Mahat
Abstract In today’s world, social media websites have become a huge part of people’s lives. Although social media provides the power of communication and content sharing, it has the potential to harm its users in magnitude greater than its power to benefit its users. The negative effects of social media have become more and more prevalent and there does not seem to be any kind of precautionary actions being carried out by the creators of social media platforms. These social media platforms have great potential to cause harm, and in a way that is not entirely obvious. They tend to harm a person’s mind by making them feel insecure, exposed and even exploited because of the misuse of user’s personal information. The main objective of this paper is to find out how social media users in the age group of 17–23 years are being affected by these platforms. For this purpose, data has been collected by reaching out to people in above-mentioned age group and asking them questions related to their social media usage. This paper also tries to suggest mitigation strategies to the negative effects of social media being faced by someone by creating an application that evaluates their current social media usage and gives back responses based on the usage data of other users. Keywords Machine learning · Linear regression · Mean square method · Social network platform
1 Introduction Social media has taken over people’s lives in a way that can only be described as unbelievable. Social media platforms have come a long way in the past 15 years. Back then very few people were using social media platforms but now the situation has become completely opposite. With the increase in use of social media, we can see the increase in negative effects of it in larger numbers. There have been quite a M. Mahat (B) Department of Computer Engineering, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_31
319
320
M. Mahat
few studies that have asked people if using social media has made them mentally weak, and the response has been “yes”. Looking at the use of one of the most dominant social media platform Facebook, it can be observed that while only 1.5% of total population was using Facebook in 2008, in 2018 30% of total population were using it. According to the study published by Datareportal in January 2020, the number of people who are actively using social medial has surpassed 3.8 billion people. This number is a 9% increase from the last year. With this drastic increase in users, people are bound to experience numerous negative effects because of these social media platforms. A majority of these users are non-adults who are not able to make good decisions for themselves and make use of these platforms completely unsupervised. With the need of having a supervising system for making sure that people do not experience negative effects of social media platforms, this paper aims to provide a practical solution to those problems by using linear regression machine learning algorithm.
2 Literature Survey There have been quite a few works done in this domain. Most of the works that have been done as a study and analysis of those study, but not much, have been done to provide a solution to the problems encountered. There has been a study of the impact of social networking sites on young people by Khurana et al. [1]. A. Sediyono et al. discussed the software requirements specification for an intelligent system to monitor the user behavior for the purpose of preventing smartphone addiction [2]. Arora et al. focused on social media applications, and the topic of addiction is explored [3]. There has been some work done in exploring the addiction to social media platforms like Facebook and YouTube by Moghavvemi et al. in the paper published in 2017 by analyzing the usage patterns of Malaysian students [4]. A study to analyze how phone carriers after providing their customers with access to internet at significantly less cost have affected the usage of social media platform is explored in a paper published in 2017 by Singh et al. [5]. Although quite a bit of work has been done on this topic of social media addiction, a working practical solution to this problem has not been proposed. This paper aims to provide a practical solution to this problem while providing the analysis of current situation of social media usage among people in the age group of 17–23 years.
3 Dataset The dataset was constructed by manually surveying 288 people in the age group of 13–23 years, from which the responses of people of the age group 17–23 years were used. The users were asked multiple choice questions on their social media platform usage behavior. The questions asked were regarding the amount of inappropriate
Prediction and Prevention of Addiction to Social Media …
321
content they have experienced and similar questions that asks them to quantitively describe their experience with social media platforms. The users were asked a total of seven questions. Their responses were collected and converted in numerical format for easier interpretation.
4 Method For the purpose of prediction of user’s usage behavior, the data from other users are observed. Based on the usage patterns and behaviors of other users, the prediction about what the user might experience if they were to continue down the path that they are currently on is made. For this purpose, different machine learning algorithms were explored, such as naïve Bayes and SVM, but linear regression with single variable and multiple variables was found to give desired results with least complexity, so linear regression is used for the prediction of what the user’s experience will be in the future. First the user is going to answer a set of seven questions. Based on the answers by the user, prediction is going to be made by comparing the user’s data with the past data collected (Fig. 1). 1. The data is read from the dataset and plotted to find the line for linear regression. 2. Three graphs are plotted with four lines each. Each line represents different situation of usage behavior. 3. Input is taken from new user and it is plotted in each of the three graphs. 4. The distance is calculated from the point to each of the lines. Whichever line has the shortest distance from the point plotted, the user’s social media usage behavior is best described by that line.
4.1 Proposed Approach The linear regression algorithm used for the prediction purpose takes the quantitative results collected from the users and tries to find a relation between them by plotting them against each other. Using linear regression, the best line that fits our data is found. Figure 2 presents how the line fits the data. It is for illustrative purpose and is not based on real data. Here, X-axis represents independent variables and Y-axis represents the dependent variables. Here, x (x-bar) is the mean of all the X axis variables and y (y-bar) is the mean of all Y-axis variables. We calculate (x–x-bar), (y–y-bar), (x–x-bar)2 and finally (x–x-bar) * (y–y-bar). Then, we calculate the slope “m”. It is calculated by the following formula. (x − x ) y − y m= (x − x )2
(1)
322
M. Mahat
Fig. 1 Proposed system flow
X Fig. 2 Linear regression line
Prediction and Prevention of Addiction to Social Media …
323
Using the slope that we found above we can insert the value of that slope into the equation of a straight line shown in Eq. 2 and obtain the desired line. y =m∗x +c
(2)
We need to find the y intercept which is represented by “c”. For this, we find the mean of all the x-axis values which will be used as x. Then we find mean of all the y-axis values which will be used as y. Using the m calculated from Eq. 1, we insert the values of x, y and m in Eq. 2 and obtain c. Finally, we can enter values of x axis and obtain the value of y which will give us the regression line. We find the best fitting line using mean square error method. This method works by finding the distance between actual value and the predicted value. For the value of m for which the difference between actual value and predicted value is minimum is selected as the best fit line. There are three graphs that are plotted. Graph 1: Time spent on social media versus time spent on physical activities. Graph 1–Age versus inappropriate content Graph 1–Time spent on social media versus level of inappropriate content The graphs created are as follows. Graph 1: Figure 3 is just for illustrative purpose and is not based on real data. First line accounts for top 10% of the graph. Second line accounts for 10–60% of the graph. Third line accounts for 60–80% of the graph. Lastly, the fourth line accounts for 80–100% of the graph. Graph 2: Figure 4 is just for illustrative purpose and is not based on real data. First line accounts for top 20% of the graph. Second line accounts for 30–60% of the graph. Third line accounts for 60–80% of the graph. Lastly, the fourth line accounts for 80–100% of the graph. Graph 3 Figure 5 is just for illustrative purpose and is not based on real data. First line accounts for top 10% of the graph. Second line accounts for 10–60% of the graph. Third line accounts for 60–80% of the graph. Lastly, the fourth line accounts for 80–100% of the graph.
5 Experiment and Results After analyzing the data collected, the percentage of users that have faced problems of cyber-crimes are as follows.
324 Fig. 3 Time spent on social media versus level of inappropriate content. X parameter = Time spent on social media, Y parameter = Time spent on physical activities, First line = Generated using least usage data, Second line = Generated using the ideal usage, Third line = Generated using above average usage data, Fourth line = Generated using worst usage data
Fig. 4 Age versus inappropriate content. X parameter = Age, Y parameter = Level of inappropriate content, First line = Generated using data of worst scenarios, Second line = Generated using data of bad scenarios, Third line = Generated using data of average scenarios, Fourth line = Generated using ideal data
M. Mahat
Prediction and Prevention of Addiction to Social Media …
325
Fig. 5 Time spent on social media versus level of inappropriate content. X parameter = Time spent on social media, Y parameter = Level of inappropriate content, First line = Generated using data of worst scenarios, Second line = Generated using data of bad scenarios, Third line = Generated using data of average scenarios, Fourth line = Generated using ideal data
Table 1 Percentage of users that have faced mentioned cyber-crimes WhatsApp Hacking Photos being misused Fake profiles
Facebook
Instagram
Twitter
YouTube
5.42
3.05
4.75
1.36
3.39
2.03
1.36
1.36
0.0
1.36
10.85
5.76
9.4
1.69
7.76
The results seen in Table 1 is obtained from the dataset created and is completely untouched and contains no bias whatsoever. The unusual result that is seen in Table 1, like the percentage of users that have experienced their photos being misused on Twitter, is the result of the dataset not having any examples of misused photos in Twitter. This problem will proceed to subside once the number of users that use this system grow, and as they do, their data will be stored for further analysis. Every time the dataset is increased by 10 new user’s data, the linear regression algorithm is applied again to keep the model updated. Once the new user enters their data by answering the above-mentioned questions, their data will be plotted on the three graphs and conclusion will be drawn. For the experiment, the following questions were asked to the users. 1. 2. 3. 4.
What is your age? Which social platform do you use the most? How much time do you spend on social media in a day? How much time do you spend on physical activities?
326 Fig. 6 Pi chart of responses to question
5. Have you ever been a victim to any of these cyber-crimes? 6. Which type of communication do you prefer? Here are the responses collected (Figs. 6, 7, 8, 9,10, 11, 12).
Fig. 7 Bar graph of responses to question 2
Fig. 8 Pi chart of responses to question 3
M. Mahat
Prediction and Prevention of Addiction to Social Media … Fig. 9 Pi chart of responses to question 4
Fig. 10 Bar graph of responses to question 5
Fig. 11 Bar graph of responses to question 6
Fig. 12 Bar graph of responses to question 7
327
328
M. Mahat
6 Evaluation After the user enters their data by answering the questions, their point will be plotted in the three graphs. Whichever line is closest to their point, that line will describe their social media usage behavior. If their usage is reckless and they are being exposed to far end of negative side of social media, they will be suggested to use social media less and they will be shown what is the appropriate usage behavior based on the ideal section of the graph that is plotted from the data that is being accumulated. If their data is moderate and they are not being exposed to a lot of negative effects of social media, then they will be told to be careful with their usage and they will be told what their usage might lead to if it were to be increased. If the user’s usage is moderate then they will be told to keep using the platforms as they are using and similar to previous case, they will be told what their usage might lead to if it were to be increased. Lastly, if their social media platform usage is very minimal then they will be told to increase their usage by a little in hopes of bringing them out of their shell and be less of an introvert. The algorithm used here seems to get the job done with minimal complications. Although once the database reaches a certain size, deep learning could be used to achieve better results in predictions but for now, this machine learning algorithm is working perfectly fine.
7 Conclusion Using data manually collected from people ranging between age 17 and 23, social media platforms usage behaviors were successfully analyzed and a system to monitor user’s usage of these platforms was successfully constructed. Using this system anyone can know how their usage compared to others and get suggestions based on their usage by comparing their data with others. As the number of users grows, the system becomes more and more accurate and the results obtained from this system will be more helpful. This system could help people who have unusually high social media usage and also ones who have unusually low social media usage. This is a self-progressing system that will keep get better with time.
8 Future Work For future work, once more users use this system and the dataset reaches a large number of examples, different approaches could be used to improve the prediction of the model. Also, more diverse data could be collected and added to current dataset to further broaden the scope of this application. Also, more types of suggestions could be added as the dataset becomes more complex and diverse.
Prediction and Prevention of Addiction to Social Media …
329
References 1. N. Khurana, The impact of social networking sites on the youth. J. Mass Commun. J. 5, 285 (2015) 2. A. Sediyono, A. Ariwibowo, Software requirement specification of intelligent system for monitoring and preventing smartphone addiction. in 2017 International Conference on Smart Cities, Constructive Automation and Intelligent Computing Systems (ICON-SONICS) (Yogyakarta, 2017), pp. 54–58 3. S. Arora, D. Okunbor, Social media applications: are the youth addicted?. in 2015 International Conference on Cyberspace (CYBERAbuja), (Abuja, 2015), pp. 229–235 4. S. Moghavvemi, A.B. Sulaiman, N.I.B. Jaafar, N. Kasem, Facebook and youTube addiction: the usage pattern of Malaysian students. in 2017 International Conference on Research and Innovation in Information Systems (ICRIIS), (Langkawi, 2017), pp. 1–6 5. R. Singh, Impact of reliance jio on indian telecom industry: an empirical study. Int. J. Sci. Res. Manage. (IJSRM) 5(07), 6469–6474 (2017) 6. https://datareportal.com/reports/digital-2020-global-digital-overview 7. TeleGeography—The Jio Effect : How the Newcomer Made an Impact in India https://blog.tel egeography.com/the-jio-effect-howthe-newcomer-made-an-impact-in-india 8. Internet https://www.internetworldstats.com/asia/in.htm World Stats–Usage and Population Statistics 9. Vancouver island Free Daily—Social media addiction having deadly results among youth https://www.vancouverislandfreedaily.com/news/social-media-addiction-having-deadly-res ults-among-youth/ 10. Bridging Science and Life—How can you recover from social media addiction? https://bridgi ngscienceandlife.com/how-can-you-recoverfrom-social-media-addiction/ 11. https://ourworldindata.org/ rise-of-social-media#:~:text=The% 20 percentage % 20of % 20US%20 adults,to%20around%2030%25%20in%202018
Analysis of Block Matching Algorithms for Motion Estimation in Video Data Awanish Kumar Mishra and Narendra Kohli
Abstract Demand for video data is increasing exponentially, and to cope with the growing demand of video data, various block matching techniques are designed. Video compression has been in the demand among research community. Various methods have been studied for video compression but block matching algorithms are very popular and have been very useful for video compression. This article is to review various block matching motion estimation techniques for video compression. This paper evaluates the performance of different block matching motion estimation algorithms using MAD in terms of PSNR and execution time. These block matching algorithms use different search patterns and are different from one another in search pattern and approach, and hence these algorithms have different number of points to search matching blocks. Motion estimation of blocks and objects has been very important and significant in any of the video standard. All standards for storage and compression of the video, as per the international standards, are using one of the block matching algorithms for the reduction of temporal redundancy. All the video standards from H.261 to H.264-Advanced video coding or latest H.265-High efficiency video coding are based on motion estimation techniques to reduce temporal redundancy. Keywords Block-based · Video coding · Motion estimation · Video sequence · Motion compensation and redundancy
1 Introduction Owing to exponential growth in video communication, many standards have been designed and developed to compress video data. Digital video coding that can achieve A. K. Mishra (B) · N. Kohli Hartcourt Butler Technical University, Kanpur, India e-mail: [email protected] N. Kohli e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_32
331
332
A. K. Mishra and N. Kohli
good data compression rates without degrading the significant video quality first emerged in 1990s with MPEG and H.26X with interframe coding [1–3]. Video encoding is a technique to transform the video data from one format to another format with the conditions that its quality and contents be maintained and the storage requirement of the video be reduced significantly. Cost of the video coding is also a very important factor in choosing the efficient video encoding technique [4–6]. Efficient video encoding techniques [7–9] may help in better utilization of the available resources, like internet, storage capacity. Redundancies in the frames of any video sequence may be minimized by several techniques. Block matching algorithms for the motion estimation divide the current frame into different equal size blocks and examine the movement of these blocks in reference frame to estimate motion vector. After finding the motion vectors, frame is coded using these motion vectors and hence produces compressed frame. Motion estimation is used to find the motion vector, and in this process of block matching, some block matching criterion is also used. Basic block matching criteria include MAD, SC-MAE, scaled value criterion [10], and many more. Video coding is also used for the security [11] of video data. Object detection [12] and block matching collectively are very vital for video coding. Blocks and objects involved in any sequence move from one location to other, and this movement of objects in the video sequence is observed by the estimation of motion. Successive frames are different from each other, only because of the movements of these objects involved in the video sequence in the form of rotation, translation, shift and scale. Process to estimate the movement of blocks and objects in video data is evaluated and is called motion vector, and correct motion vector is the key to the success of block matching motion estimation algorithms.
2 Block Matching Algorithms Block matching algorithms aim to estimate the motion of the blocks of the current frame form the blocks of reference frame. Correct estimate of the motion of the blocks is essential to ensure the quality of the encoded video sequence. In the block matching algorithms current frame is initially divided into equal sized blocks, generally block size remains 16 × 16. Now, every block in the current frame is searched in the previous frames using some of the block matching algorithm with some suitable chosen matching criterion. Generally, adjacent previous frame is selected as the reference frame. This process of finding the block of current frame in the reference frame is called motion estimation. Motion estimation stores the information of the block of the current frame in the reference frame in the form of motion vector. Motion vector is the displacement vector that tells about the motion of the block while moving from the current frame to the reference frame. In the process to find motion vector, only few neighboring blocks are searched. Search parameter p decides the size of the search window. Generally, search parameter p is taken as 7 and hence search window is of size 16 × 16. One block of the current frame is matched with various
Analysis of Block Matching Algorithms for Motion Estimation …
333
Fig. 1 Block matching motion estimation and compensation
Fig. 2 Performance comparison of motion estimation algorithms in terms of PSNR
blocks of the reference frame and the block having the minimum cost function value is assumed as the matching block (Figs. 1, 2, 3).
2.1 Full-Search Motion Estimation Algorithm In full-search motion estimation all the possible candidates for the matched block are checked to find the best match. Full-search motion estimation approach compute the cost by using any suitable matching criterion at every possible location in search of matching block in search window. Full search computes all candidate blocks in search window of reference frame for all blocks in the current frame. In exhaustive search, blocks of size N ×N from the current frame are considered and then a search for
334
A. K. Mishra and N. Kohli
Fig. 3 Performance comparison of motion estimation algorithms in terms of execution time
matching block in reference frame under search window of size ± w in both directions is performed. The total number of search positions that are considered in full search are exactly (2w + 1) × (2w + 1) search positions. In this way the block with minimum error is selected as the matching block with the help of matching criterion, and motion vector is determined for the block in current frame. By performing exhaustive search in full-search algorithm for the computation of motion vector in search of best matching block in reference frame under search window, best matched block is searched, but this approach demands maximum number of computations and hence increases substantial computation load.
2.2 Three-Step Search Motion Estimation Algorithm Three-step search [2] is very popular because of its simplicity and accuracy in finding right matching block of the reference frame for the block of current frame from the video sequence. This algorithm computes the search simply checking at 25 different search points. The TSS algorithm is described in the following way: Step 1: TSS starts with the decision of first step size and at eight search points at a distance of step size from the center are evaluated along with the location at the center. In step 1, totally nine search points are tested for best matched block. Search point with minimum distortion among tested nine search points is chosen as the center for further computation in the next step. Step 2: New step is decided by formula stepsize = stepsize/2. And the center at the end of the first step is now new center for further processing. Now with the new step size, eight search points are tested for the best matched block. In this step
Analysis of Block Matching Algorithms for Motion Estimation …
335
again, search point with minimum distortion among tested eight search points and center is chosen as the center for further computation in the next step. Step 3: The step size is halved. And the center is moved to the point chosen as center at the end of the first step. Now with the new step size, eight search points are tested for the best matched block. In this step again, search point with minimum distortion among tested eight search points and center is chosen as the final point, and using this final point motion vector is derived. TSS determines quality matching block in most cases, but TSS also fails many times in determining correct matching block due to its uniform search pattern. This problem of the three-step search is removed in the other algorithms for the motion estimation. Many dynamic and adaptive algorithms are evolved to overcome the disadvantage of the TSS.
2.3 New Three-Step Search Motion Estimation Algorithm TSS approach was modified as new three-step search algorithm [4] in 1994 to minimize search points below 25 for computing motion vector through motion estimation in less time. NTSS is very dynamic to adapt in the nature such that the new approach may terminate at any step among the three. NTSS algorithm maintains the simplicity of three-step search but improves the performance by minimizing the number of points to search correct block by introducing the view of real nature of video applications. NTSS is described in the following way: Step 1: After fixing the step size initially, totally 17 searches are performed for the correctly matching block. If the cost is minimum at the center point, then terminate the process and consider (0,0) as a motion vector. If the cost is minimum for the point that are neighbors of the center point, then go to step 2; otherwise go to step 3. Step 3 is required if the minimum cost is measured for the neighboring pixels of the search window. Step 2: Make the center to the point where cost is minimum and again do calculation for the minimum cost for all eight neighbors of the new center and stop the search by finding minimum cost point as motion vector point. In this step only three or five points are to be considered for the cost evaluation. Step 3: Make the center to the point where cost is minimum and again do calculation for the minimum cost for all eight neighbors with stepsize = stepsize/2 and follow the process defined in three-step search for further processing. New three-step search performs better than three-step search in number of computations and also in the quality of encoded frame.
336
A. K. Mishra and N. Kohli
2.4 Four-Step Search Motion Estimation Algorithm The four-step search [5] is very useful in finding the correctly matched block in the reference frame in minimum number of computations. Four-step search is very easy to understand and robust. The 4SS algorithm is described in the following way: Step 1: In the very first step of four-step search algorithm center point is taken from the center of the block of current frame and nine computations are performed at the step size equal to two. In the first step if the best matched block is the center of the nine checked points, then go to step 4, otherwise go to step 2. Step 2: In this step if the best matched block from the step 1 is in the corner of the points considered for the calculation, then consider the best match point as the new center, and then do five more computations to make all computations done at step size equal to two; else consider the best match point as the new center and then do three more computations to make all computations done at step size equal to two. Step 3: In step 3 all the actions taken in the step 2 are repeated, and after this, it will enter into step 4. Step 4: In this final step, four-step search center is shifted to the point that is best matched in the previous step and eight more computations are performed for the cost computation at the neighboring pixels and the best matched point is considered for the estimation of motion vector. Four-step search is a very efficient search algorithm for the search of best matched block in the reference frame and its performance is far better than three-step search and gives performance the same as new three-step search. In the worst-case scenario, four-step search underperforms by new three-step search only with two more computations.
2.5 Diamond Search Motion Estimation Algorithm The diamond search algorithm [6] for the motion estimation of the block of the current frame in the reference works by using two different patterns that form the shape of a diamond. In the first pattern nine search points are checked for the best matched block; among these nine search points, eight are the surrounding points and one is the center point, and these nine search points collectively form the shape of diamond and is called large diamond search pattern (LDSP) for the search of the best matched block. On the other hand, second pattern searches for five search points, and among these five search points, four are the surrounding points and one is the center point and these five points collectively form the shape of diamond and is called small diamond search pattern (SDSP). In the diamond search approach, search for the best matched block progresses by forming the search pattern based on LDSP till the best matched block occurs at the center.
Analysis of Block Matching Algorithms for Motion Estimation …
337
The diamond search algorithm for motion estimation works based on two search patterns. The first pattern called large diamond search pattern (LDSP) checks for best matching block at nine search points; among these nine search points, eight points surround the center point and this makes the shape of a diamond. The second pattern having just five search points composes a small diamond shape called small diamond search pattern (SDSP). In the searching mechanism of diamond search algorithm, LDSP is repeated till the matching block with minimum error occurs at the center point. The diamond search algorithm can be defined in detail in the following way: Step 1: In the first step of the diamond search, the best matched block search is performed based on LDSF, and if the best matched block is found to be at the center, then go to step 3, otherwise go to step 2. Step 2: The best matched block found in the first step is considered as the new center for further search using the diamond search as performed in step 1 and repeat it recursively until the best matched block is not found in the center of the LDSP. Step 3: If the best matched block is at the center, then this step is used for further checking and four points are evaluated for the search of the best matched block in the reference frame, making the pattern of SDSP. Best matched block on using the pattern of SDSP will be the base for the determination of the motion vector. Diamond search computes the best matched blocked with better efficiency, and its performance improves further in case of video sequence with more contents based on movements of the blocks or objects. It takes nearly 25% less computations if compared with NTSS.
2.6 Genetic Rhombus Pattern Search Motion Estimation Search for the best matched block in the reference frame using the motion estimation is really very effective, especially if motion estimation uses the concept of block matching using the pattern match. All the algorithms for the search of best matched block are based on weighting function, hence some algorithms that are less dependent on the weighting function take only some values of weights. It may be possible if some weighting function be targeted and it should be based on some information about the previous blocks. There are so many pattern-based algorithms to search best matched block and these algorithms have two important stages: (1) Search stage that is initial should be coarse and (2) search stage that is in the end should be very fine. Rough location to search the best matched block is defined by the coarse stage of search and the precise or detailed location of the search is defined by the five-search stage. In the coarse stage, best search path to find the best matched block is simply defined by the path that starts from the center point from where search starts, and this path completes with the point that defines the motion vector. This has four main stages: starting stage, mutation, competition and termination stage. GRPS [8] performs 20% better than other algorithms and maintains the same.
338
A. K. Mishra and N. Kohli
Average search point of GRPS is 27%, better than other block matching algorithms, 50% better than EHS, 123% better than DS, 163% better than FSS and 138 times better than exhaustive search (FS).
2.7 Efficient Motion Estimation Optimization Efficient motion estimation [13] for video coding is a very effective algorithm for motion estimation. This algorithm is nearly 50% faster than any other algorithm of same time by maintaining the same subjective quality of the video sequence. This algorithm uses many concepts collectively to improve the performance of the existing techniques. Many concepts like coded block pattern (PCB), rate distortion cost (RD) and initial search point (ISP) are used to decide the best matching block. This algorithm improves the performance of the conventional block matching algorithm from 40 to 60% by negligible degradation of the subjective quality of video sequence.
2.8 Fast Motion Estimation Based on Content Property Fast motion estimation based on content property [14] considers the motion vector of neighboring blocks in the selection of key block. This algorithm first finds if any block from the neighbors have its determined motion vector and based on this, motion vector for the current block is initially searched at the location of the motion vector of decided neighboring block. This is because of the consistency in the contents of the objects in the current frame and reference frame. Performance of this algorithm depends on the number of neighbor blocks with decided motion vector. If all the neighboring blocks have its motion vector predetermined, then the cost to determine the motion vector for the current block will be minimum, and hence the cost to determine the motion vector increases gradually as the number of neighboring blocks with decided motion vector decreases.
2.9 Adaptive Pattern for Diamond Search Algorithm In adaptive pattern selection strategy for diamond search motion estimation algorithm [15], convention diamond search algorithm is modified to find the best match using SDSP and LDSP. In this modified algorithm, initially SDSP is used to search five locations, and if the cost at the center is minimum at all the locations, then center is the output. If the cost at center is not minimum, then shift the center to the point with minimum cost and again compute cost at four locations based on SDSP; if the cost at the center is minimum at all the locations, then center is the output. If the cost
Analysis of Block Matching Algorithms for Motion Estimation …
339
at center is not minimum, then use conventional diamond search for the selection of best matched block of the reference frame.
3 Experiment and Results Various block matching algorithms are being tested for their performances using mean absolute difference (MAD). In experimental setup current frame is partitioned into blocks of equal size, and the block size is 16 × 16. Maximum movement of a block between the adjacent frames is 7 pixels; that is, a block can move in rows and columns at most up to 7 pixels in all directions, hence the window in which a pixel is to be searched is of size (2*7 + 1) x (2*7 + 1). Table 1 shows the performance of various algorithms on these three video sequences with scaled value criterion for the cost computation using PSNR and execution time (Table 2). Search parameter for this experimental setup is 7. In case of exhaustive search, total search points according to the setup is taken as 225. For evaluation only first 30 Table 1 Performance comparison of motion estimation algorithms in terms of PSNR
Table 2 Performance comparison of motion estimation algorithms in terms of execution time
PSNR
Input sequence Foreman
Football
Flower garden
TSS
32.4
22.3
24.8
NTSS
32.7
22.4
25.4
4SS
29.6
21.7
21.2
DS
26
21.8
18.5
GRPS
26.5
21.7
20.6
EMS
24.6
23.4
21.2
FME
28.3
23.8
22.7
APDS
28
24.5
23.4
Execution time
Input sequence Foreman
Football
Flower garden
TSS
11.85
17.43
13.63
NTSS
10.5
15.34
11.45
4SS
8.56
14.73
10.47
DS
8
12.8
9.5
GRPS
7.5
12.2
10
EMS
2.5
6.4
2.32
FME
2.1
12.2
2.44
APDS
4.45
8.5
3.3
340
A. K. Mishra and N. Kohli
frames of three video sequences are taken into consideration, and these are Foreman, Football and Flower garden.
4 Conclusion In this paper, review of many block matching algorithms is presented to give a comparative study of the performance of these block matching algorithms. These algorithms are very helpful in the video compression as the compressed video can help in many ways for better utilization of memory and for the improvement in the uploading and downloading speed of the data in the network. Many conventional algorithms are there for the evaluation of block matching motion estimation, but in this area of research, recently many genetic algorithms are also performing very well in terms of quality of the encoded video and the PSNR value. This article covers many algorithms, which are being used by many video coding schemes like AVC/HEVC. It is expected for the versatile video coding (VVC) standard to be launched in 2020. Motion estimation in video is very important to improve the video encoding performance. Performance of video encoding using block matching motion estimation will be definitely improved in the near future using the concepts of neural network. Researchers are trying to find some very effective motion estimation techniques using neural network and machine learning.
References 1. J.R. Jain, A.K. Jain, Displacement measurement and its application in interframe image coding. IEEE Trans. Commun. COM-29,1799–1808 (1981) 2. T. Koga, K. Iinuma, A. Hirano, Y. Iijima, T. Ishiguro, Motion-compensated interframe coding for video conferencing. in Proceedings NTC81 (New Orleans, LA. November 1981), pp. C9.6.1–9.6.5 3. ISO/IEC 11 172–2 (MPEG-1 Video), Information technology-coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s: Video, (1993) 4. R. Li, B. Zeng, M.L. Liou, A new three-step search algorithm for block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 4, 438442 (1994) 5. L.M. Po, W.C. Ma, A novel four-step search algorithm for fast block motion estimation IEEE Trans. Circuits Syst. Video Technol. 6, 313317 (1996) 6. S. Zhu, K.-K. Ma, A new diamond search algorithm for fast block-matching motion estimation. in Proceedings International Conference Information Communications Signal Processing (ICICS ”97), Sep. 9–12, vol. 1, (1997), pp. 292–296 7. T. Wiegand, G.J. Sullivan, G. Bjontegaard, A. Luthur, Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003) 8. J.J. Tsai, H.-M. Hang, A genetic rhombus pattern search for block motion estimation. in Proceedings IEEE International Symposium Circuits Systems (ISCAS ”07) (New Orleans, LA, May 2007), pp. 3655–3658 9. Advanced Video Coding for Generic Audiovisual Services, ITU-T Document H.264 and ISO/IEC Standard 14496–10 (2013).
Analysis of Block Matching Algorithms for Motion Estimation …
341
10. A.K. Mishra, R.K. Purwar, Performance analysis of block matching criterion in video data on embedded processor using VHDL 2009. in Proceeding of International Conference on Methods and Models in Computer Science (ICM2CS), (2009) 11. A. Kumar, Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. Int. J. Cloud Appl. Computi (IJCAC) 9(3), 22–36 (2019) 12. A. Kumar, Object detection system based on convolution neural networks using single shot multi-box detector. Proc. Comput. Sci. 171, 2610–2617 (2020) 13. Z. Pan, Y. Zhang, S. Kwong, Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Trans. Broadcast. 61(2), 166–176 (2015) 14. S.-H. Park, E.S. Jang, Fast motion estimation based on content property for low complexity H.265/HEVC encoder. Broadcast. IEEE Trans.63(4), 740–742 (2017) 15. Z. Pan, R. Zhang, W. Ku, Adaptive pattern selection strategy for diamond search algorithm in fast motion estimation. Multimed. Tools Appl. 78, 2447–2464 (2019)
Information Retrieval Based on Telugu Cross-Language Transliteration Swapna Narla, Vijaya Kumar Koppula, and G. SuryaNarayana
Abstract In the current scenario, information access over cyberspace has increased gradually. A huge amount of information in different languages are available on cyberspace. Most of the people want to retrieve the information in their native languages. This paper portrays regarding an English-Telugu cross-lingual information retrieval (CLIR) system. CLIR fetches Telugu documents to answer a specified English (or) Telugu query. In Indian subcontinent, Telugu is the most well-known spoken dialect in the states of Telangana, Andhra Pradesh and Union Territories of Andaman Nicobar and Yanam. A cross-lingual information retrieval process desires a parallel corpora or bilingual dictionary, which is used to translate texts from one language to another. For in-house translation, the proposed approach uses a bilingual dictionary. This scheme also supports the monolingual information retrieval method. This study also presents the effect of using language dependent and independent stemmers in monolingual information retrieval system. Keywords CLIR · Monolingual IR · Stemmer
1 Introduction Pertinent documents retrieval for a user query is called information retrieval (IR) system, where the query is a collection of root words. Cross-lingual information retrieval (CLIR) implies fetching the required documents in a language other than the given query language. Most of the documents are placed on internet in English S. Narla · V. K. Koppula · G. SuryaNarayana (B) CMR College of Engineering & Technology, Vardhaman College of Engineering, Hyderabad, India e-mail: [email protected] S. Narla e-mail: [email protected] V. K. Koppula e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_34
343
344
S. Narla et al.
language only. To fetch the documents in user’s native language, the CLIR system becomes very essential. Both the documents and query languages are to be transformed using CLIR. The retrieval performance of CLIR is diminutive by this translation when compared with monolingual IR system. Mislaid of common terms, precise vocabulary and incorrect word translation because of uncertainty [1] are key causes for this diminutive performance. Query translation is the most appropriate approach for CLIR. Various query translation approaches are machine translation, dictionary-based, ontology-based and corpora-based method [4]. The machine translation and corpora-based methods need parallel corpora which is available nominal for highly inflectional languages. The fundamental initiative of machine translation is to replace each term in the query from one language to another with a suitable term (or) a group of terms from the lexicon. In machine translation, a translator must analyze and interpret each term and know the word influence in both the languages. The machine translation approach was used in our proposed work. India is a multilingual country; particularly, South Indian languages are affluent in morphology compared with other Indian languages. One of the most spoken dialects is Telugu in the states of Telangana and Andhra Pradesh, which belongs to Dravidian family. It is a highly inflectional and morphological well-off language. It consists of 52 alphabets, with 16 vowels and 36 consonants. Each Telugu alphabet is called as Akshara. Each Telugu word is formed by the combination of consonants and vowels because formation of Akshara has some complex rules. The canonical structure of formation is (C(C)) CV, where C is a consonant and V is a vowel. Recently, the Governments of Telangana and Andhra Pradesh maintain their data and websites in native (official) language. So, the volume of electronic Telugu data has increased very much. The cross-lingual information retrieval task will help to retrieve the documents in Telugu in response to English query. The monolingual IR task will fetch the Telugu documents in response to Telugu query. This manuscript is structured as follows: Related works for CLIR and monolingual information retrieval system are illustrated in Sect. 2. Different resources used for CLIR approaches in Indian languages outlook are discussed in Sect. 3. Section 4 explains our prototype for CLIR and monolingual information retrieval system, while Sect. 5 draws the conclusion.
2 Related Works In this section, cross-language information retrieval (CLIR) studies are discussed. CLIR allows the queries (or information needs) in one dialect and retrieves documents in another dialect. Various researches have been reported on CLIR for ChineseEnglish [7], European Languages like German-English [5], French-English [2] and Arabic-English [11]. In India, CLIR is still in its primordial state because of complexity of Indian languages. The initial major works have been reported in Hindi [10]. The CLIR study on Hindi is done for the period of TIDES Surprise Language
Information Retrieval Based on Telugu …
345
Exercise [10]. The basic intention of this work was to fetch Hindi documents in retort to English queries. Comparable work has description for different Indian languages like Bengali [4], Tamil [14]. Chinnakotla et al. proposed a system for Hindi to English and Marathi to English CLIR systems [20]. Chaware et al. proposed an easy and efficient approach to translate local language keyword to English and it is tested using three local language query processing, namely Marathi, Gujrathi and Hindi [17]. Similar experiment was conducted on CLIR using WX-notation by CALTS group, HCU, HYD. The concept of bilingual information retrieval, CLIR and multilingual information retrievals in Indian languages was discussed by Swapna et al. [13]. Seetha et al. [3] introduced a dictionary-based query translation approach to assess EnglishHindi cross-language information retrieval system. In this scheme, each and every term of the query is replaced by equivalent lexicographically. In Indian language prospective, a CLIR approach was discussed by Bajpai et al. [15].
3 Resources Used Various resources are used in our proposed approach to processing English/Telugu queries and to fetch the documents in Telugu language. The Telugu data set was collected from online newspapers and Wikipedia. In preprocessing stage, an English stop word list and a stemmer developed by Robert KroEvets is known as KSTEM [16] or corpus-based statistical approach [12] was used. The other resources like an English/Telugu bilingual dictionary, Telugu root word dictionary which was developed by University of Hyderabad (UOH), language dependent and independent stemmers [18] and a list of Telugu stop words (need to be developed) were used for processing Telugu queries and documents. The vector space model (VSM) was also used for document retrieval and ranking [8, 9, 6].
4 Prototype for CLIR and Monolingual IR A prototype of our proposed system is shown in Fig. 1. It supports both crosslingual information retrieval (English/Telugu) and monolingual information retrieval (Telugu/Telugu) system. Each and every individual module is explained in the subsequent sections. The prevalent process of proposed CLIR/monolingual information retrieval is exemplified as in Fig. 1 and it can disparate in various segments. The preliminary section focuses on query preprocessor, next on document preprocessor and finally on vector creator and processor. Query Preprocessor Initially, a query is accepted either in English (or) Telugu. Prototype of our proposed system implemented to accept a query in natural language is to fetch the keywords from that query and submit it to an information retrieval (IR) system. This query is
346
S. Narla et al.
Fig. 1 Prototype of proposed system
passed through process like document preprocessing, apply the tokenizing, remove the stop words and apply stemming method. The output is a bag of weighted terms. Proper terms are represented with highest weight. If the input is given in English language, then in addition to the above, process query is also conceded through a conversion process. A bilingual English/Telugu dictionary is used for this purpose. This approach turns the weighted query idioms into analogous Telugu words. The string tokenizer class was applied for tokenizing process. Most of the stop words are language-dependent and UMass’s stop word list was used for English and a new stop word list was developed in-house for Telugu language. Most of the tokens might carry out identical information (e.g. tokenization and tokenizing). Repeatedly, sinking all tokens to its base form using different stemming dictionaries, identical information can be avoided. The Robert Krovetz KSTEM stemming algorithm was used for English and a variety of language dependent and independent root word identification approaches [18] were used for Telugu language. Finally, use monolingual information retrieval or cross-lingual information retrieval approach on proposed system. Document Preprocessor Document preprocessing is a multifaceted method that pre-eminences to the representation of each document by a select set of index terms. The documents from the document Corpora are subjected to processes that include lexical analysis like tokenizing, stop words elimination, stemming and indexing term selection. This process facilitates in plummeting the number of terms to be stored. These idioms are indexed and accumulated in a hash table called the inverted index file. This file holds an index of each term and a postings list for each and every term. The repositioning list contains documents id and the frequency of occurrence of the term (tf ), that is,
Information Retrieval Based on Telugu …
347
number of times that term appears in all the documents of the collection, and the document frequency (df ) is the number of documents. Stemming is essential for document preprocessing and query preprocessing and is applied by language dependent and independent stemming models for Telugu. The various language dependent models are Vibhaktulu-based stemming, suffix removal stemming and rule-based with suffix replacement stemming. In language-dependent model the user needs a linguistic knowledge for root word identification, whereas a language-independent model doesn’t need any linguistic familiarity, and it was an easier approach for stemming. Pseudo syllable N-gram is language-independent model. It is the process of finding the root word by stripping the word end part by first considering minimum stripping length as one and increasing it to maximum stripping length (depends on word length). The performance of information retrieval system was measured for each stemming model [19]. Vector Creator and Processor Vector space model is needed for document ranking and retrieval to represent a set of terms as a document. These terms are the words left over after the processes like stop word removal and stemming. A collection of all these set of terms, representing a document, is called the “document space” of corpus. Each distinct term in the set corresponds to one dimension in the document space. A vector space model performance can be enhanced by an appropriate term weighting schema. After the stemming, each and every remaining term comes into view in bilingual dictionary, which is English to Telugu dictionary. A set of multiple Telugu/English meanings for a given query term would be obtained for a given Telugu/English term. Several terms may not be originated in the bilingual dictionary with the help of language-dependent stemmers, and if the term is a right name or a valid noun in Telugu it did not take place in the dictionary. In numerous cases, the dictionary search for a term might not succeed for the reason that of offensive stemming. Most of the Indian languages are highly complex languages; in particular, Telugu is highly agglutinative which would need a very high-quality stemming algorithm. The accuracy language-independent stemmers is more than the languagedependent stemmers [18]. More number of terms is found with independent stemmers in the bilingual dictionary. The retrieval performance is increased in both CLIR and monolingual information retrieval system with language-dependent stemmers.
5 Evaluation of Retrieval Performance In the experiment, the performance of English/Telugu CLIR and Telugu/Telugu monolingual information retrieval system was evaluated based on precision and recall instead of accuracy. Accuracy was not an appropriate measure for information retrieval problem. The measures of precision and recall are used to concentrate the evaluation on what percentages of the relevant documents have been found and how
348
S. Narla et al.
Table 1 The performance of CLIR and monolingual IR using stemming models IR
Stemming models
Precision (%)
Recall (%)
English/Telugu (CLIR)
Language-dependent models
55.88
38.15
Language-independent models
72.28
61.14
Telugu/Telugu (Monolingual IR)
Language-dependent models
64.1
45.8
Language-independent models
76.54
70.85
100 72.28
80 60 40
55.88
61.14
76.54 70.85
64.1 45.8
38.15
20 0 CLIR with language dependent stemmer
CLIR with language independent stemmer
Precision
Monolingual IR with dependent stemmer
Monolingual IR with dependent stemmer
Recall
Fig. 2 The performance of CLIR and monolingual IR using stemming models
many non-relevant documents have also been returned. This is one of the most important advantages of precision and recall. The performance of CLIR and monolingual IR using stemming models is shown in Table 1. We send a query term in source language (i.e., either in English or in Telugu) and retrieve the documents in Telugu. The word-level precision and recall were defined as follows. Precision(P) = Recall(R) =
Number of correctly extracted words (Number of extrcted words)
Number of correctly extracted words (Number of correct words)
(1) (2)
From the experiments we found that the performance of CLIR and monolingual IR significantly increased with language-independent stemmer than the dependent stemmers and is shown in Fig. 2.
6 Conclusion In this work, English/Telugu CLIR using bilingual dictionary and Telugu/Telugu monolingual information retrieval system was developed. The performance of this
Information Retrieval Based on Telugu …
349
information retrieval system was measured with various existing stemming models. In our proposed method, the cross-lingual information retrieval performance is increased with language-independent stemmers compared with dependent stemmers. Precision and recall increased from 55.88 to 72.28% and 38.15 to 61.14%, respectively, with language independent stemmers. In future, we can also use hybrid model to improve the CLIR performance.
References 1. A.R. Diekema, Translation events in cross-language information retrieval. ACM SIGIR Forum 3(1), (2004) 2. A. O’Gorman, I. Gabby, Sutcliffe, in French in the Cross-language Task. CLEF (2003) 3. A. Seetha, S. Das, M. Kumar, Evaluation of the English-Hindi cross language information retrieval system based on dictionary based query translation method. in 10th International Conference on Information Technology (IEEE Computer Society, 2007), pp. 56–61 4. D. Mandal, S. Dandapat, M. Gupta, P. Banerje, S. Sakar, in Bengali and Hindi to English Cross-languagew Text Retrieval under Limited Resources. CLEF (2007). Available at http:// www.clefcampaign.org/2007/working_notes/mandal CLEF2007.pdf 5. D. Neumann, in: A Cross-language question answering system for German and English. CLEF(2003) 6. D.L. Lee, H. Chuang, K. Seamons, Document ranking and the vector-spacing model. IEEE Software43 14(2), 67–75 (1997) 7. F. Yu, D. Zheng, T. Zhao, S. Li, H. Yu, Chinese-English Cross-lingual information retrieval based on domain ontology knowledge. in: International Conference on Computational Intelligence and Security, 3–6 Nov. 2006, vol. 2 (2006), pp. 1460–1463 8. G. Salton, A. Wong, C.S. Yang, A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975) 9. J. Becker, Topic based VSM, in: Business Information Systems, Proceedings of BIS 2003, (Colorado Springs, USA, 2003) 10. Leah S. Larkey, Margaret E. Connell, Nasreen Abdulijaleel, Hindi CLIR in thirty days. ACM Trans. Asian Lang. Info. Process. (TALIP) 2(2), 130–142 (2003) 11. M. Aljlay, O. Frieder, D. Grossman, On Arabic-English crossLanguage information retrieval: a machine translation approach. in: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’02) 8–10 April (2002), pp. 2–7 12. M. Santhosh, K. Narayani, in: Corpus Based Statistical Approach for Stemmer. ILRIL-2007, (New-Delhi, 2007) 13. N. Swapna, N. Hareen Kumar, B. Padmaja Rani, Information retrieval in Indian languages: a case study on cross-lingual and multi-lingual. Int. J. Res. Comput. Commun. Technol. IJRCCT 1(4). ISSN 2278–5841 14. P. Pingali, J. Jagarlamudi, V. Varma, Webkhoj: Indian language IR from multiple character encodings. in: International World Wide Web Conference May 23–26, (2006) 15. P. Bajpai, P. Verma, Cross language information retrieval: in indian language perspective. Int. J. Res. Eng. Technol. 03(10), (2014). eISSN: 2319–1163| pISSN: 2321-7308 16. R. Krovets, Viewing morphology as an inference process. in: Proceedings of the 16th Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval, (1993), pp. 191–202 17. S.M. Chaware, S. Rao, Information retrieval in multilingual environment. in: Second International Conference on Emerging Trends in Engineering and Technology, ICETET-09, (IEEE Computer Society, 2009), pp. 648–652
350
S. Narla et al.
18. S. Narala, B.P. Rani, K. Ramakrishna, Experiments in telugu language using language dependent and independent models. Int. J. Comput. Sci. Technol. (IJCST) 7(4), (2016). ISSN: 0976–8491 (Online)| ISSN: 2229-4333 (Print), Oct-Dec 2016 19. S. Narala, B.P. Rani, Analysis of pseudo N-Gram model on telugu document classification. Int. J. Emerg. Technol. Adv. Eng. (IJETAE) 6(10), (2016). (ISSN 2250–2459, ISO 9001:2008) 20. T. Xu, D.W. Oard, Maryland: English-Hindi CLIR” FIRE-2008 21. P.L. Nikesh, S. Mary Idicula, D. Peters, in: English-Malayalam CLIR. An Experience. (IEEE, 2008)
Predicting the Risk of Patients from Corona Virus in India Using Machine Learning Ayush Jha, M. Venkatesh, Tanushree Agarwal, and Saurabh Bilgaiyan
Abstract Toward the end of December 2019, the Government of China revealed a flare-up of a pneumonia-like ailment among individuals with an obscure wellspring of inception in Wuhan, Hubei Province with the capacity to spread from individual to individual with no physical contact. COVID-19 was identified as a possible cause for this disease. The World Health Organization realized the severity of this disease and declared it a public health emergency. The first case of COVID-19 in India was reported in late January 2020 and has been increasing ever since, with many casualties. Considering the severity of the disease and India being a developing country with a huge population, the government might face difficulty in testing if citizens get affected by this deadly virus. Our work aims at giving an approximate idea of whether a person is COVID positive or negative using the current statistics and integrating them through various machine learning models. Keywords Machine learning · COVID-19 · SARS-CoV-2 · Curve-fitting · Gaussian naive bayes
1 Introduction The novel corona virus or SARS-CoV-2 is a group of a very severe RNA virus which, if enters the bloodstream of mammals or birds, can cause serious health issues and A. Jha · M. Venkatesh · T. Agarwal School of Electronics Engineering, KIIT Deemed to Be University, Bhubaneshwar 751024, India e-mail: [email protected] M. Venkatesh e-mail: [email protected] T. Agarwal e-mail: [email protected] S. Bilgaiyan (B) School of Computer Engineering, KIIT, Deemed to Be University, Bhubaneshwar 751024, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_35
351
352
A. Jha et al.
can even lead to the death of an individual by causing infection in the respiratory tract of the victim. The common symptoms include common cold, fever, and so on. In early December, a large number of people in Wuhan started to fall sick and they showed symptoms of pneumonia-like disease. This was reported by the Chinese government to the World Health Organization in late December 2019 [1]. The virus was highly communicable and could be traced back to a novel strain of corona virus by the WHO, further declaring it as a worldwide pandemic [2, 3] and naming it as 2019-nCoV. This was later renamed to SARS-CoV-2 by the ICTV. This dangerous virus can spread from one person to another without any physical contact. It can be transmitted through air, water and is very small to be observed by the human eye. The cure for this virus hasn’t been discovered yet; hence, in order to protect oneself from getting affected, a person must maintain social distancing, use masks, avoid touching their face or eyes, and wash hands at regular intervals using soap or sanitizer frequently [4]. The first virus infected person in India was reported in January 2020 having his origin from China. India being a developing country with a population of 1.38 billion, it is very difficult for the government to treat people due to lack of medical facilities. An estimation in the status of the effect of virus, that is the chances of being infected by the virus can serve as a very helpful tool in warning citizens as soon as they come in contact with an infected person so that they could quarantine themselves, and avoid getting in contact with another person [5]. The virus has many potential natural hosts, as shown in Fig. 1. The statistics observed currently shows that the number of cases in the world is 23,390,690, the recovered cases are 15,912,888, and the unfortunate deaths are 808,783. However, WHO has recently declared that living with the virus is the new normal [6]. This paper proposes a machine learning model technique to help predict if a patient will test COVID-19 negative or positive based on their ongoing symptoms and travel history [2]. Thus, this paper is structured as follows: Sect. 2 discusses the related works on COVID-19 and the risks to the patient. Section 3 analyzes the present condition of India, till 5 May 2020. Section 3 visualizes the COVID-19 cases and the situation of patients. Section 4 elaborates on how an appropriate model was chosen along with their accuracy score and prediction. Section 5 directs to conclusion.
Fig. 1 Transmission of corona virus
Predicting the Risk of Patients from Corona Virus …
353
2 Related Works The spread of the corona virus in the last 3–4 months has been exhilarating, striking fear in the lives of humanity. This virus which started in China has traveled its way almost to the entire globe and continues to affect millions with each passing day. Though the virus is not fatal, it has been successful in wiping out an entire generation in some countries like Italy and Spain. This virus triggers the immune system of a human being and corrodes it from within until the person succumbs. If not controlled, the virus is feared to take down humanity at a snail’s rate. Many related works are available in the literature and authors have tried to identify the most relevant papers in the literature, and the description is given as follows: Ardabili et al. [7] predicted the outbreak of COVID-19 using machine learning; the paper demonstrates the potential of machine learning and further suggests that prediction can be realized through the SEIR model. However, the accuracy is low for long-term prediction. Tuli et al. [8] using machine learning and cloud computing predicted the growth and trend of COVID-19. The study uses a case study to demonstrate the spread of the virus; the study provides a better prediction. Rustom et al. [9] forecasted the future of corona virus using supervised machine learning, where they used two models, namely LR and Lasso, which produced a good result; but SVM showed poor results; overall, the prediction was productive. Reddy and Zhang [10] forecasted the corona virus time series transmission using the LSTM network in Canada. The pattern revealed from the data showed that the approaches taken by the authorities showed a positive impact as compared to other countries and the result helped to monitor the situation and prevent transmission with high accuracy. Chintalapudi et al. [11] predicted the recovered cases from the registered cases of corona virus in Italy after 60 days of lockdown through a model-driven approach, where they estimated that the size of registered and ongoing cases can decrease if the present lockdown continues for another two months. Mandal et al. [12] predicted and developed a way to control COVID-19 through a model-based study on dynamics; they targeted to forecast the cases for a short term. The model also provides a better perception of the dynamics of control and spread. Since it is found that many works have been proposed using different methods in various countries, therefore, this paper proposes the prediction of whether a person is affected by the corona virus or not.
3 Data Analysis This section contains details of data which was obtained, observing it using different graphs.
354
A. Jha et al.
3.1 Reading the Data Set Two data sets were prepared; the first data set is represented in Table 1, which shows statistic of cases for different states or union territories, till 5 May 2020. The table has features such as state and union territory names, total number of confirmed cases, which includes both Indian nationals and foreign nationals, and also the cured and death cases [13]. The second data set has features such as age, gender, region, detected state, nationality, travel history, disease history, symptom and label, which indicate whether the patient is found positive or not. Snapshot of data set has been presented in Fig. 2. As the data set contains large amount data, therefore some rows are only presented in Fig. [14].
3.2 Analyzing COVID-19 Cases in India According to the data sets, it has been found that the total number of cases till 5 May 2020 was 90644. Now, the state/union territory-wise cases are to be analyzed. Therefore, in Fig. 3, total cases, cured and death data are represented for states having more counts highlighted in different shades of red, for state/union territories with more number of cases is highlighted as dark shade of red, which implies more danger and similarly the light shade implies fewer amounts of cases and danger. The total number of active cases, that is the number of cases on removing the cured and death cases from total cases of each state/UT, is calculated and represented in descending order in Fig. 4. Hence, the top 5 states/union territories with maximum active cases are: Maharashtra, Tamil Nadu, Gujrat, and Delhi and Madhya Pradesh.
3.3 Visualizing COVID-19 Cases The data sets have been read and analyzed. From the data set represented in Fig. 2, we visualize the pattern or symptoms which affect the patient and accordingly classify them as positive or negative. In Fig. 5, patients have been characterized based on their sex. The graph in the figure shows that a larger number of males have been tested positive when compared to the females. Next, we consider which age group is most affected by the virus. From Fig. 6, it is clearly visible that the age group within the range of 40–60 is the most affected. Now, Fig. 7 shows the most common symptom seen in the positive patients. Dry cough is the most common symptom. After understanding the situation of corona virus—active, cured and death cases for each state/union territory—the analysis for the number of cases in India as a whole was done, by summing up all the cases. So, a new data set was formed; the data set groups confirmed, death and cured cases of India date wise, till 5 May 2020. The whole data set is being represented as a graph in Fig. 8.
Predicting the Risk of Patients from Corona Virus …
355
Table 1 COVID cases for each state and union territory S. no.
Name of state/UT
Total confirmed cases (Indian national)
Total confirmed cases (foreign national)
Cured
Death
1
Andhra Pradesh
2355
0
1353
2
Bihar
1179
0
453
7
3
Chhattisgarh
66
1
56
0
4
Delhi
9332
1
3926
129
5
Gujrat
10987
1
4308
625
6
Haryana
873
14
514
13
7
Himachal Pradesh
78
0
43
3
8
Karnataka
1090
2
496
36
9
Kerala
579
8
495
4
10
Madhya Pradesh
4782
7
2315
243
11
Maharashtra
30703
3
7088
1135
12
Manipur
7
0
2
0
13
Mizoram
1
0
1
0
14
Odisha
737
0
196
3
15
Pondicherry
13
0
9
1
16
Punjab
1944
0
1257
32
17
Rajasthan
18
Tamil Nadu
19
Telangana
20
Chandigarh
21
J&K
22
Ladakh
23 24
49
4954
2
2839
126
10579
6
3538
74
1499
10
971
34
191
0
51
3
1121
0
542
12
43
0
0
0
Uttar Pradesh
4257
1
2441
104
Uttarakh and
87
1
51
1
25
West Bengal
2576
0
872
232
26
Tripura
167
0
64
0
27
Meghalaya
13
0
11
1
28
Ladakh
13
0
22
0
29
Jharkhand
217
0
113
3
30
Goa
17
0
7
0
31
Dadar Nagar Haveli
1
0
0
0
32
Assam
92
0
41
2
33
Arunachal Pradesh
1
0
1
0 (continued)
356
A. Jha et al.
Table 1 (continued) S. no.
Name of state/UT
34
Andaman And Nicobar Islands
Total confirmed cases (Indian national) 33
Total confirmed cases (foreign national) 0
Cured 33
Death 0
Fig. 2 Snapshot of data set of patient with different features governing whether the patient found to be COVID positive or negative
Three colors, namely blue, yellow and red, indicate the confirmed, recovered and death cases, respectively. This visualization shows that the number of COVID-19 positive patients is increasing daily; many patients are getting recovered and the death rate is low.
Predicting the Risk of Patients from Corona Virus …
357
Fig. 3 Result of analysis, for observing state/union territory with maximum risk from corona- virus as more number of cases, i.e., more patients are affected
4 Applying Best-Fit Algorithm This section shows the comparison between different types of models and why we used these models.
4.1 Choosing Appropriate Model The data set in Fig. 2 had many features but for prediction all were not relevant, so after data wrangling Fig. 9 was obtained with the following features: Age, gender, travel history, disease history, symptom and the state/UT to which the patient belonged. The cleaned data set had many rows of data, so in Fig. 9, a snapshot of only some part of it has been represented.
358
A. Jha et al.
Fig. 4 Representation of state/union territory with COVID-19 cases in descending order
Regression is used when we have to study the relationship between a dependent and an independent variable. Our data set contains independent variables which are the travel history, disease history and the symptoms, and then using them we predict the dependent variable. Hence, regression is the most appropriate method to be used for the data set. Label encoding was also performed for data wrangling; It is the transforming of labels into numeric form so that a machine is able to understand it. Algorithms then used will be better able to process the data. It is an important part of feature engineering in supervised learning of machine algorithms. Logistic regression, in statistics, is a logistic model used to model the probability of a particular class or event, like pass/fail, win/lose, alive/dead or healthy/sick.
Predicting the Risk of Patients from Corona Virus …
359
Fig. 5 Graph represents the patients categorized on the basis of sex
Fig. 6 Graph represents the patients categorized on the basis of age group
This can be extended to model several classes of events, like determining whether a picture contains a cat, dog, lion, and so on. Each object detected in the image would be assigned a probability between 0 and 1, with a sum of one. Logistic regression is a predictive analysis. A linear regression line can be written as Eq. (1): for some input value (x) we get predicted output (y); for each input value, linear equation assigns a coefficient (c1 ), and for a two-dimensional plot, we use one more coefficient for the degree of freedom which is referred to as an intercept (c2 ). y = c1∗ x + c2 .
¯ y¯ ) (x−x)(y− where c1 = ¯ 2 (x−x) x (bar) = mean of input values y (bar) = mean of output values
(1)
360
Fig. 7 Affected patients on the basis of their symptom Fig. 8 Graph represents the static rate of confirmed, recovered and death cases in India
A. Jha et al.
Predicting the Risk of Patients from Corona Virus …
361
Fig. 9 After cleaning a large data set, this data set turned out to be used for further process of prediction
Naive Bayes classifiers are a set of classification algorithms supported by the Bayes’ theorem. It is not one algorithm, but a family of algorithms where all of them share a standard principle, that is, every pair of features being classified is independent of each other. Naive Bayes classifiers are the easiest to work with in comparison to other classifiers as we need to estimate only the mean and the standard deviation from the training data. The naive Bayes classifier is based on Bayes theorem with the assumption that features are not dependent on each other. As the assumption is quite preliminary, it is termed as a naive assumption. The mathematical expression of naive Bayes theorem is shown in Eq. (2). P( A|B) = P(B|A)∗ P( A)/P(B)
(2)
where P(A|B) = the probability of event A occurring given that B is true. P(B|A) = the probability of event B occurring given that A is true. P(A) and P(B) are probabilities of events A and B occurring, respectively. X is the group of input variables and y is the output variable. P(Y |x) = P(y|X )∗ P(y)/P(X )
(3)
where P(Y|x) = the probability of event Y occurring given that x is true. P(y|X) = the probability of event y occurring given that X is true. P(y) and P(X) are probabilities of events y and X occurring, respectively. Equation (3) is solving for the probability of y given input features X. Because of the assumption that variables are independent, we can rewrite P(X|y) as follows: P(X |y) = P(x1 |y)∗ P(x2 |y)∗ . . .∗ P(xn |y)
(4)
where x 1 , x 2 , x 3 , …, x n are the input variables and y is the output variable. In Eq. (4), we are applying the naive Bayes theorem on the input variables x 1 , x 2 , x 3 , …, x n and the output variable y.
362
A. Jha et al.
Also, since we are solving for the output variable y, P(X) is a constant which means that we can remove it from the equation and introduce proportionality. From Eqs. (3) and (4) we get: P(y|X ) ∝ P(y)∗
n
P(xi|y)
(5)
i=1
where n is the number of input variables. In Eq. (5), the constant P(X) has been replaced and a proportionality sign has been introduced. The aim of naive Bayes is to choose the class y with the maximum probability. Argmax is an operation that finds the argument that gives the maximum value from a target function. In this case, we need to find the maximum y value. Therefore, from Eq. (5) we get: ∗
y = arg max P(y)
n
P(xi|y)
(6)
i=1
where n is the number of input variables. Based on the minimum value of y obtained from Eq. (6), the algorithm predicts the result. The two models, logistic regression and naive Bayes, were used to predict the result, and both produced a good result but Gaussian naive Bayes performed much better. Gaussian naive Bayes performance has been presented in Fig. 10. By using the classification report, Fig. 11 was obtained, wherein confusion matrix has been graphically represented. Confusion matrix summarizes the model prediction; and it shows how the proposed model got confused with the predictions, and the type of errors it made. Table 2 shows the representation of confusion matrix. We can predict accuracy score of any model using the confusion matrix of the model using Eq. (7), which shows the accuracy of the confusion matrix. Accuracy = (TP + TN)/(TP + FN + TN + FP).
(7)
Gaussian naive Bayes performed very well in predicting the patient risk from corona virus. The accuracy score was 96.58%.
Fig. 10 Gaussian naive Bayes classification report
Predicting the Risk of Patients from Corona Virus …
363
Fig. 11 Naive Bayes performance through confusion matrix
Table 2 Representation of confusion matrix
Class 1 predicted
Class 2 ppredicted
Class 1 actual
True negative (TN)
False positive (FP)
Class 2 actual
False negative (FN)
True positive (TP)
4.2 Predicting the Patient Risk By using naive Bayes model, the graph in Fig. 12 was plotted representing whether the patient is positive or negative, if the prediction is high, that is, 1 implies that the patient is positive, and similarly for low, that is, 0 means patient is diagnosed as negative. Fig. 12 Graph represents the predicted output as the patient will be positive (high) or negative (low)
364
A. Jha et al.
5 Conclusion In this paper, we proposed a model to predict whether the patient will be COVID positive or not using machine learning techniques; the paper shows the comparison between different models. The analysis of different models showed that the Gaussian naive Bayes model is the best fit for the data as we got an accuracy of 96.58%. Predictions showed that the number of cases will be increasing even after surpassing months in lockdown, and also that the number of people recovering will be more, with increase in mortality rate. Kindly take all the necessary precaution during this pandemic so that the world beats this virus.
References 1. P. Samui, J. Mondal, S. Khajanchi, A mathematical model for COVID-19 transmission dynamics with a case study of India. vol. 140 (2020) 2. B.H. Fredj, F. Cheriff, Novel Corona virus disease infection in Tunisia: mathematical model and the impact of the quarantine strategy. vol. 138 (2020) 3. E.M. Wilson, H.L. Chen, Travellers give wings to novel coronavirus (2019-nCoV), vol. 27 (2019) 4. WHO: Coronavirus. https://www.who.int/health-topics/coronavirus#tab=tab_2 5. L. Wang, Y. Wang, D. Ye, Q. Liu, Review of the 2019 novel coronavirus (SARS-CoV-2) based on current evidence. Int. J. Antimicrobial, Agents, (2020) 6. C. Huang, Y. Wang, X. Li, B. Cao et al., Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 2020, 497–506 (2019) 7. S.F. Ardabili, A. Mosavi, P. Ghamisi, F. Ferdinand, A.R. Varkonyi-Koczy, U. Reuter, T. Rabczuk, P.M. Atkinson, COVID-19 outbreak prediction with machine learning. Preprints 2020. 2020040311 (https://doi.org/10.20944/preprints202004.0311.v1) 8. S. Tuli, R. Tuli, S. Tuli, S. Gill, Predicting the growth and trend of covid-19 pandemic using machine learning and cloud computing. https://doi.org/10.1101/2020.05.06.2009/900 9. F. Rustam et al., COVID-19 future forecasting using supervised machine learning models. IEEE Access 8, 101489–101499 (2020) 10. V. Reddy, L. Zhang, Time series forecasting of COVID-19 transmission in Canada using LSTM networks. vol 135 (2020) 11. N. Chintalapudi, G. Battineni, F. Amenta, COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: a data driven model approach. vol 53. (2020), pp. 396–403 12. M. Mandal, S. Jana, S.K. Nandi, A. Khatua, S. Adak, T.K. Kar, A model based study on the dynamics of COVID-19: Prediction Control. 136 (2020) 13. Mygov.in.: COVID 19 statewise status. https://www.mygov.in/corona-data/covid19-statewisestatus/ 14. A. Kyatham, COVID-19 India. https://www.kaggle.com/adityakyatham/covid19
Parallel Implementation of Marathi Text News Categorization Using GPU Sangita Lade, Gayatri Bhosale, Aishwarya Sonavane, and Tanvi Gaikwad
Abstract Nowadays, handling immense online-generated data has become the most challenging task for a data analyst. Like all other things, reading new articles online is a new trend. For India, there are a total of 22 major regional languages. So, the news publishers of regional language newspaper have hosted their websites for daily news. The manual categorization of the news articles is a difficult and time-consuming task. Hence, an automatic categorization of news articles using the latest machine learning techniques can be used. K-NN machine learning classification algorithm with the most popular term-weighting scheme TF-IDF for categorization of the Marathi news article in one of the pre-defined categories (sports, economy and entertainment) is used here. As this work is compute-intensive, a parallel algorithm is proposed and implemented on GPU Nvidia Tesla T4. This paper specifies the categorization of Marathi news into categories like sports, entertainment, and economy with the help of GPU. Keywords TF-IDF · K-NN · GPU · CUDA C/C ++ · ML
1 Introduction With the rapid growth of online information, text categorization has become one of the key techniques for handling and organizing text data [1]. Automatic text S. Lade (B) · G. Bhosale · A. Sonavane (B) · T. Gaikwad (B) Department of Computer Engineering, Vishwakarma Institute of Technology, Upper Indira Nagar, Pune 411 037, Maharashtra, India e-mail: [email protected] A. Sonavane e-mail: [email protected] T. Gaikwad e-mail: [email protected] G. Bhosale e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_36
365
366
S. Lade et al.
categorization is a process of automatically assigning text documents to different categories. With the increased use of the internet, the generation of huge online documents in different languages has also increased. This increased use of computers has led to great research and advancement in machine learning. Till now, there are a number of researches that have been done on English text documents as it is a widely spoken language. The proposed work presents the automatic text categorization of Marathi documents, mainly news articles available in Marathi using machine learning algorithms. The system will categorize the news articles based on different categories of news like sports, economy, entertainment by using GPU. From the literature survey, it has been found that digitization is at its peak nowadays. Digitization means converting text, photos, or voice sound into a digital form that can be processed by a computer. Hence, huge online data are being generated every day. Different news publishers have also moved toward digitization and hosted their own news websites where people can read the news on a daily basis online with the help of internet connectivity.
2 Related Works In this section, related work of research is included, which shows the various analyses and researches made in text categorization or classification using different machine learning algorithms. The researchers focus their work on the regional languages of India. But it can be said that there is not much work been done for the Marathi language in this area of research. Rakholia et al. [2] have used Naïve Bayes (NB) classifier for classifying Gujarati text documents according to six pre-defined categories, which are sports, health, entertainment, business, astrology, and spiritual. Their findings from this research are that the experimental results show that the accuracy of the NB classifier without and using features selection was 75.74 and 88.96%, respectively. Sijun Qin et al. [3] proposed an approach of Chinese text feature selection based on contribution value (CV), part of speech (POS) filter, and synonym merge. They carried out experiments over corpus-TanCorpV1.0 and found that the proposed method performs better than the traditional ones. Swapna Narala et al. [4] used the K-nearest neighbor algorithm in their work on the categorization of the Telugu text document. They have used Telugu text document corpus for their research. Hanumanthappa and Narayana Swamy [5] in their work by extracting keywords from different Indian languages (Tamil, Kannada, Telugu, etc.) text documents did text summarization and text categorization based on language. Keyword extraction is done using TF-IDF calculations. And categorization is done using K-NN, j48, and NB classifiers applied on a pre-processed text. With their research, they have proved that NB classifier has greater accuracy, that is, 97.666% compared to K-NN and C4.5(J48) classifiers. Pooja Bolaj et al. [6] research work is a survey on all text categorization techniques used for different Indian regional languages and their
Parallel Implementation of Marathi Text News Categorization …
367
comparison. They concluded from their work that for regional languages like Bangla, Urdu, Telugu and Punjabi, Naïve Bayes and K-nearest neighbor algorithms are more suitable, whereas in case of Marathi language LINGO works finer. Vijayan et al. [7] discussed a detailed survey on the text classification process and various algorithms used in this field.
3 Proposed Methodology The proposed work introduces a system called Marathi news categorization as one of the solutions for categorizing Marathi news articles. The system mainly includes gathering of various online Marathi news from news websites, forming the training corpus, pre-processing of the data, TFIDF calculations on a training/testing dataset to form feature vector space, and finally using K-NN on test data for classifying test news articles correctly. At the final stage of the system, the machine learning algorithm (K-NN) is applied to the feature vectors of test data and training data. The test news document is classified by using Euclidean distance similarity measure on train and test vectors. The Euclidean distance is calculated between the TF-IDF vector of the test file and the TF-IDF vector of training files. Then the Euclidean distance array obtained from the previous step is sorted and the first 10 items (since K = 10) are taken into consideration for the classification of the test file. In the work, CUDA C/C ++ is used for implementing the K-NN algorithm on GPU. This further leads to increased speed in classifying the news articles. The news file dataset is created using crawler and all the files are of the extension.txt. The website used for crawling news is esakal.com. The dataset, which is mounted on Google Drive, includes 1000 training files of sports, entertainment, and economy news files each. This is the training dataset of 3000 files totally. In testing dataset, there are a total of 300 files, which include all the types, that is, sports, entertainment, and economy. There are two programs, one for pre-processing the data and other for classification of the testing data. In the work, Google Colaboratory is used as a platform which supports the GPUNvidia Tesla T4, which is one of the products that Nvidia has developed. Nvidia Tesla T4 is based on Turing architecture and consists of one TU104-895-A1 chip. The number of CUDA cores inside Nvidia Tesla T4 is 2560 [8]. Google Colaboratory was chosen as a platform since it supports CUDA C/C ++, GPU as a hardware accelerator, a RAM of 12 GB, and also NVCC plugin can be loaded efficiently [9]. All these require a few runtime environment configurations [10]. A GPU can be used for parallel computing when you have a huge number of tasks or computations to be performed. If a CPU is used for compute-intensive tasks, for example, an ML or DL application where such huge datasets are used, it would take more time and become an overhead for time critical applications. The proposed work exercises the use of Euclidean distance calculation, which is done repeatedly for all the words in the dataset. This portion of work, when handed over to Nvidia
368
S. Lade et al.
Tesla T4, is parallelized using parallel threads on multiple cores of the GPU. Thus, the computation takes less time than a CPU. With an increase in the number of data, the type of GPU can also be advanced to provide the same benefits.
3.1 Implementation The dataset for training has been crawled from esakal.com for Marathi news. A dictionary of Marathi verbs, adverbs, nouns, and adjectives is used. The text files are pre-processed as follows: 1. Tokenization: In the process of tokenization, sentences of news paragraphs are broken into separate words using whitespaces as the delimiter. These words are then known as tokens. 2. Stop word removal: Stop words are those words which do not participate in the task of classification, which are irrelevant, most frequent, and less important. It is always a good practice to remove them to increase the speed of document processing. Term frequency (TF): Term frequency also known as TF measures the number of times a term (word) occurs in a document. The frequency of terms is undoubtedly higher on a large document than that on a small document. Hence, it is necessary to normalize the document based on its size. Hence, we have divided the term frequency by the total number of terms. For example, if in a document the term game occurs two times and the total number of terms in the document is 10, then the normalized term frequency is 2/10 = 0.2. Following is the formula: T F(t) =
Number of × tappered ∈ a document Total number of terms ∈ a document
(1)
Inverse document frequency: In the first step all terms are considered equally important. Actually, some terms that occur a lot of times and frequently are considered to have less power in determining the relevance. Hence, a method to weigh down the effects of these terms is required. Further, the terms that occur less frequently in the document are supposedly of higher relevance. Hence, a method to weigh up the effects of these terms is required. Here, logarithms are of great help to solve this problem. I D F(t) = log =
Total number of documents Number of documents with term t ∈ it
(2)
TF-IDF: Each word or term has its respective TF and IDF score. The product of TF (term frequency) and IDF (inverse document frequency) scores of 1 is known as the TF-IDF weight of that term. The TF-IDF weight and the rareness of the term are in inverse proportion. Based on the number of times a particular keyword appears in
Parallel Implementation of Marathi Text News Categorization …
369
the document, the TF-IDF algorithm weighs a keyword in any content and assigns the importance/relevance to that keyword. More importantly, it checks how relevant the keyword is throughout the corpus. Hence, the following. T F I D F(t) = T F(t) × I D F(t).
(3)
Now, TF-IDF is added to all words and a file called as universal vector space is created which contains all words of all files of all categories along with their weights. This is training set of files. In the testing phase, the test news articles are tested using the classification model built. Like training phase, for all test documents the pre-processing is applied, and the feature vectors are formed using term-weighting. The recognition is computed using those feature vectors.
3.2 Parallelization of K-NN Algorithm After we have the training and testing data ready, we are supposed to find Euclidean distance. The length of path connecting two points is known as Euclidean distance between them [11]. k Euclidean Distance(X, Y ) = ((X i − Yi ))2
(4)
i=1
This calculation of Euclidean distance is parallelized using GPU. The GPU version used for this work is Nvidia Tesla T4, readily available on Google Colaboratory. The language for GPU computing is CUDA C/C++. Terminology: Host—the CPU and its memory (host memory) and Device—the GPU and its memory (device memory). The kernel launches used in this code have single block multiple threads launch configuration. The size of 1 block is 3000 threads. The following steps are implemented on CUDA: 1. Memory (in terms of 1D arrays) is allocated on CPU using malloc() 2. Memory (in terms of 1D arrays) is allocated on GPU using cudaMalloc(), e.g. cudaMalloc((void**)&dev_a, size); where dev_a is the array to be sent to GPU. 3. The host arrays are assigned to the indexes of files, words, their corresponding TFIDF. 4. The host arrays are copied to GPU using cudaMemcpy() as follows cudaMemcpy(dev_a, &a, size,cudaMemcpyHostToDevice); 5. The kernel function is called for calculating Euclidean distance.
370
S. Lade et al.
6. Distance array is copied to CPU using cudaMemcpy(as, &dev_as, size, cudaMemcpyDeviceToHost); 7. The kernel function is called for sorting Euclidean distance. 8. Sorted distance array is copied to CPU using cudaMemcpy(as, &dev_as, size, cudaMemcpyDeviceToHost); 9. The news is classified according to indices using the value of K = 10 The elapsed time which is recorded by cudaEvent_t object is 15.488640 s. The same code when run on CPU faces a buffer overflow problem for the same dataset because of the heavy size of the dataset.
4 Observations The theoretical speedup is 875% for the K-NN algorithm implementation on GPU. The speed is 875% more than CPU while implementing the K-NN algorithm for the same size and type of dataset. The implementation time on GPU is around 15 s. The speedup is calculated using Amdahl’s law for speedup, which has the following formula: Speedup ( f, n) = 1/((1 − f )/n + f )
(5)
where f = serial fraction, (1–f ) = parallel fraction, n = number of parallel processors (cores) of GPU. Amdahl’s law states that the general performance improvement gained by optimizing one part of a system is restricted by the fraction of your time that the improved part is really used. In parallel computing, Amdahl’s law is especially used to predict the theoretical maximum speedup for program processing using multiple processors. The following graph in Fig. 1 depicts that Amdahl’s law is followed by this work. It shows the number of processors on the x-axis and corresponding speedup (in percentage) on the y-axis. At first, the speedup increases for a certain number of processors and then remains constant even if the number of processors is increased. For example, as the number of processors increases from 8 to 240 the speedup percentage increases from 445 to 853% and after that it is almost constant, that is, ≈ 875% for our algorithm. As the project is implemented on Nvidia Tesla T4 GPU the speedup is around 875%, which is constant after a certain number of processors as discussed above.
5 Conclusion The objective of the work entitled “Parallel Implementation of Marathi Text News Categorization” is accomplished. After adding a greater number of files to the training
Parallel Implementation of Marathi Text News Categorization …
371
Fig. 1 Graph depicting Amdahl’s law
and testing dataset, higher accuracies and efficiency of classified news articles can be obtained. We can increase the number of categories to be extracted by only making minor changes in the program. There exists a scope of improvement to increase the accuracy and speedup using different GPU architectures. Furthermore, more speedup can be achieved if we parallelize TF-IDF calculations. When the dataset increases further, different kernel launch configurations, like multiple blocks multiple threads, multiple blocks single thread, and so on, can be used. Thus, this work is well-scaled in memory as well as in time. Also, this work can be extended to OpenCL/GL framework to increase the scope which is currently limited to CUDA C/C++. The proposed work is of great use to media industries to classify various types of news as well as to the websites to classify news information. Acknowledgements This research was partially supported by Prof. Vivek Deshpande, Head of the Department, Dept. of Computer Engineering, Vishwakarma Institute of Technology, Pune. The authors express their gratitude towards him. Also, they are thankful to Prof. Smt. Sangita Lade for supporting this research and providing immense expertise and guidance in the field of GPU architecture that greatly assisted this research. The authors are grateful to her for assistance with CUDA C/C ++ and also for moderating this paper which helped in improving the manuscript.
References 1. https://link.springer.com/chapter/10.1007/978-3-642-25188-7_9 2. R.M. Rakholia et al., Classification of Gujarati documents using naïve bayes classifier. Indian J. Sci. Technol. 10(5), (2017). https://doi.org/10.17485/ijst/2017/v10i5/103233 3. S. Qin, J. Song, P. Zhang, Y. Tan. Feature selection for text classification based on part of speech filter and synonym merge. in: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) (2015). https://doi.org/10.1109/fskd.2015.7382024 4. S. Narala et.al., telugu text categorization using language models. Global J. Comput. Sci. Technol. 16(4), (2016)
372
S. Lade et al.
5. Hanumanthappa et al., Indian language text documents categorization and keyword extraction. IJCTA 9(3), 37–45 (2016) 6. P. Bolaj et al., A survey on text categorization techniques for indian regional languages. IJCSIT 7(2), 480–483 (2016) 7. V.K. Vijayan, K.R. Bindu, L. Parameswaran, A comprehensive study of text classification algorithms. in: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (2017). https://doi.org/10.1109/icacci.2017.8125990 8. https://en.wikipedia.org/wiki/Nvidia_Tesla 9. https://medium.com/@iphoenix179/running-cuda-c-c-in-jupyter-or-how-to-run-nvcc-ingoogle-colab-663d33f53772 10. https://medium.com/@oribarel/getting-the-most-out-of-your-google-colab-2b0585f82403 11. https://towardsdatascience.com/text-classification-using-k-nearest-neighbors-46fa8a77acc5
Real-Time Emotion Detection and Song Recommendation Using CNN Architecture Adarsh Kumar Singh, Rajsonal Kaur, Devraj Sahu, and Saurabh Bilgaiyan
Abstract It is said that health is wealth. Here, health refers to both physical health and mental health. People take various measures to take care of their physical health but ignore their mental health which can lead to depression and even diseases like diabetes mellitus and so on. Emotion detection can help us to diagnose our mental health status. Therefore, this paper proposes a theory for emotion detection and then a recommendation of a song to enhance the user’s mood by using the features provided by deep learning and image processing. Here, convolutional neural network-based (CNN) LeNet architecture has been used for emotion detection. The KDEF dataset is used for feeding input to the CNN model and then training it. The model has been trained for detecting the emotion. After training the model, a training accuracy of 98.03% and a validation accuracy of 97.96% have been achieved for correctly recognizing the seven different emotions, that is, sad, disgust, happy, afraid, neutral, angry and surprise through facial expressions. Keywords Machine learning · Deep learning · Emotion detection · Real-time emotion detection · Image processing · Music recommendation · Convolutional neural network
A. K. Singh · R. Kaur · D. Sahu School of Electronics Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha, India e-mail: [email protected] R. Kaur e-mail: [email protected] D. Sahu e-mail: [email protected] S. Bilgaiyan (B) School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_37
373
374
A. K. Singh et al.
1 Introduction There is no perfect definition for emotions but it can be best defined as a reflection of our feelings caused by the situation that you are in or the people you are with. Teasing out the feelings and emotions that people have and learning why they have them is very important for mental health. Emotions and expressions are those conditions of a normal person that is expressed by him in various forms and then the person responds to these emotions by following various different patterns. Several researches have been conducted in recent times that give us the evidence that emotional intelligence in a person is much more important than IQ. It has been proved that with the help of emotional intelligence we were able to predict approximately 53–54% success in case of relationships, living quality and health [1]. The quality of our emotions determines the instructions our heart sends to our brain [2]. Therefore, it is important to know about our emotions in order to have a healthy life and listening to songs is always considered to be a mood enhancer which makes the person feel better. So in this paper, a method for real-time facial emotion detection and song recommendation as per the emotion detected using various deep learning and machine learning algorithms has been proposed. Today, recognition of emotions and studying human emotions from a video is challenging task that has sought the attention of researchers that use them for analyzing general human behavior [3]. Many different methods are used to detect or recognize emotions for carrying out studies and diagnosing the issues related to it. There are many advancements in the field of facial emotion detection, and now with the help of deep learning techniques and other algorithms, it is possible to attain high accuracies for facial emotion detection. In this paper, the authors have proposed a real-time emotion detection achieving an accuracy of 98.03% on training the model and a validation accuracy of 97.96%, which is higher than the accuracies achieved in other papers relating to emotion detection and also recommend a song based on this real-time emotion detection. CNNs are special type of ANN architecture that use perceptrons for supervised learning and for data analysis [4]. These are generally used for image processing and for handling heavy datasets. It was proposed by Lecun in 1988 [5]. CNN performs convolution on the input data to give the output. Convolution is basically a mathematical operation between two functions that gives an output which shows how one function affects the size of the other. A CNN consists of various hidden layers which are basically pooling layers, convolution layers, fully connected layers, and so on. Here, instead of using normal activation functions, Conv2D and pooling layers are used as activation functions [6].
Real-Time Emotion Detection and Song …
375
2 Related Works Very few works have been done on this particular topic proposed by the authors here. Most of the work done is associated with emotional analysis. Some of the related works done in this field are as follows: Priya et al. [7] used CNN model along with deep neural learning algorithms to recognize the frame of mind of the user with an accuracy of 90.23% and then recommended a song according to the mood of the user that has been identified. Face region is extracted from the facial image using histogram analysis, and facial components are then extracted using virtual face model (VFM) and histogram analysis method. They take image of the user as input and use the machine learning algorithms to identify the mood of the user with accuracy in quality. Owing to the growth in music industry, the evolution of music recommendation should be made more userfriendly but the current recommendation approaches are based entirely on the user’s proclivity for music. So, Kuo et al. [8] have proposed music recommendation on the basis of emotions. They modified the affinity graph for association discovery between emotions and music features and achieved an accuracy of 85% on average. Dhavalikar et al. [9] presented a facial gesture recognition system with phases of face detection followed by feature extraction and then followed by expression recognition. The detection of face is done by YCbCr color model and the operations needed for preserving the necessary face features like the eyes and mouth of the face is done by applying active appearance model (AAM) method. Euclidean distance method is used to decide the output image expression. Deebika et al. [10] used CNN for emotion detection and according to the mood a song is suggested. They advised that the sluggish performances of the real-time approaches could be improved by governing the methods and visualizing the vague features, thereby enhancing the accuracy and its computational speed. Zafar et al. [11] have done similar works using algorithms based on deep learning. In their work they have created a cross platform music player that recommends songs based on the current emotion of the user. They were able to achieve an accuracy of 90.23% in recognizing the current emotion of the user by using their music classification module to classify songs into four classes based on emotions and then the module also suggests songs to the user by mapping the current emotion to the respective classes into which the songs were classified. Viral et al. [12] suggested a theory of making an emotion-based song recommendation system. They have created an android application which detects current emotion of the user with the use of Viola Jones algorithm and Fisherfaces classifier for the song recommendation part. Tang et al. [13] have proposed a new method called InnerMove to ameliorate the images and add to the number of training samples which has been observed to be better than the other comparative methods for image classification.
376
A. K. Singh et al.
3 Methodology The following sections contain the modus operandi of our paper. The KDEF dataset was selected for this model. The images of this dataset were preprocessed and then fed to the CNN LeNet architecture for training the model to get the real-time facial emotion and ultimately recommend a song based on the emotion identified.
3.1 Dataset Used for Facial Emotion Recognition There are many datasets available to use for facial expressions. The proposed work has used the KDEF dataset. The KDEF dataset was developed by Lundqvist, Flykt and Ohman in 1998, and it consists of 4900 images of people with different facial emotions [4]. The dataset consists of 70 people, among which 35 of them are males and 35 are females. The dataset contains images having seven facial emotions [4]. Each expression is viewed from five different angles [6]. In this paper, we have excluded the extreme right and extreme left viewing angles. Some examples of the images of the KDEF dataset are shown in Fig. 1.
3.2 Image Processing The dataset used consists of an equal number of images for all seven expressions. At first, these images of the KDEF were converted from BGR format to grayscale images as it is much easier to identify features in grayscale images and also reduces the image size. The pixel density of the images was made to be 256 × 256. The pixel density was reduced so as to decrease the computational time of the neural networks at the time of training the model. Some of the processed images have been shown in Fig. 2.
Fig. 1 Sample images of the KDEF dataset
Real-Time Emotion Detection and Song …
377
Fig. 2 Processed images of KDEF dataset
3.3 Convolutional Neural Network Architecture In this research work, the use of CNN for creating the model is proposed as it is the most suitable choice when it comes to images. It has a faster learning rate than other algorithms in case of image classification. Here, CNN LeNet architecture has been used which consists of three Conv2D layers with every layer having a kernel size of 3 × 3 and the number of inputs provided in the first layer being 128. Figure 3 shows the LeNet architecture of the model. After every Conv2D layer, there is a batch of normalization layers which stabilizes the LeNet architecture by normalizing the output of the previous layer, an activation layer with activation function as “relu”, a max-pooling layer having a kernel size of 2 × 2 and a dropout layer having a rate of 0.5. After flattening the model, three dense layers were introduced having a batch normalization layer after the first two dense layers which were succeeded by an activation layer with activation function as “relu” and a dropout layer having a rate of 0.5. The last layer of the network is an activation layer having an activation function “softmax” which has the same number of nodes as the output layer, that is, the seven emotion parameters that are to be determined.
Fig. 3 LeNet architecture or the CNN model
378
A. K. Singh et al.
3.4 Training the Model In the KDEF dataset 2937 images were provided as input to the model for the training purpose and 588 images for the purpose of validation. The remaining images were randomly selected as the testing images. Adam optimizer was used while training the model. The model was trained at a learning rate of 0.001.
3.5 Real-Time Emotion Detection After training the model, it was tested in real time. Live images were taken with the help of Haar Cascade classifier. A webcam or a mobile camera was used for taking real-time videos from which 30 facial images were taken per second. These images were then converted to grayscale and then were fed to the model and the classes to which they belong. The model predicted the facial emotion and showed the emotion on the same window from where the facial images from the live video were taken. Figure 4 shows a sample screenshot as how the predicted emotion was shown.
Fig. 4 Sample screenshot showing how the Harr cascade classifier using the webcam or phone camera identifies the emotion
Real-Time Emotion Detection and Song … Fig. 5 Flowchart showing the methodology followed by the authors during their research work
379
START
Dataset Analysis
Image
Creating CNN
Preprocessing
architecture
Model Training
Real Time Emotion Detection Song Recommendation
STOP
3.6 Music Recommendation Real-time face recognition was done using a webcam or through the mobile camera. 30 frames per second were captured and this was fed as input to the CNN LeNet architecture. The model extracted features from the input images and the facial emotion was recognized with an accuracy of 97–98.5%. So this shows that emotion can be efficiently identified even at bad lighting conditions and variation in face due to environmental changes. We made a list of songs that were assigned to the various emotions so that it would help to enhance the user’s mood. The system therefore recommends songs based on the facial expression detected. For example, if the system recognized “sad” as the facial emotion, it would then recommend a list of songs that would help the user to feel better. Figure 5 shows the methodology followed in this paper.
4 Results and Discussion The implementation of the proposed method was done on Google Colab. Some predefined libraries like keras and sklearn were used with the tensorflow backend. In our paper we have opted for CNN over other machine learning algorithms like SVM and logistic regression for emotion detection as convolutional neural network (CNN) works much better in case of image data. Convolutional neural network is
380
A. K. Singh et al.
Fig. 6 Accuracy and loss curves of the CNN model that has been trained; the blue curve showing the increase in accuracy and the orange curve showing the decrease in loss
much more preferable when it comes for huge image data and with data having large number of labels, whereas SVM is much preferred in cases where the dataset is small and the labels in the dataset is not huge enough. CNN is also preferable in cases where the structure of the data points is such that it can be exploited well by the CNN architecture. Here, we used CNN for emotion prediction by following the above methodology. Figures 6 and 7 show the performance metrics, that is, loss and accuracy curves of the training data and the validation data. As per the observation training accuracy was found to be 98.03% with a training loss of 0.0568 and the validation accuracy was 97.96% with a validation loss of 0.0516. The results obtained were very encouraging and better than the ones obtained in the previous studies mentioned above in the related works section. Figure 8 shows the graph for variation in accuracy.
5 Conclusion Listening to music is considered to be a very effective healing tool for people. It eases one’s body, mind and soul and makes the person feel alive and fresh. Knowing your mood helps in deciding which song one wants to listen to. So through this paper we aim to detect emotions in an easier way to help in enhancing the user’s mood by recommending songs as per their emotion detected as well as to be of aid to people working in different fields related with human emotions and psychology. An accuracy of 98.03% on training the model and a validation accuracy of 97.96% on the validation data for seven different emotions of facial expressions were achieved which are higher than the accuracies of the previous works done in this field. The limitation being on the resolution of the images used as low-quality images could not be properly
Real-Time Emotion Detection and Song …
381
Fig. 7 Graph showing the change in the loss for the training data and the validation data at the time of training the model; the red curve shows the variation in loss for the training data and the blue curve shows the changes in loss for the validation data
Fig. 8 Graph showing the variation in accuracy; the red curve shows the variation in accuracy for the training data and the blue curve shows the changes in accuracy for the validation data
382
A. K. Singh et al.
detected. However, greater resolution images, better number of epochs and a more intricately convoluted network may help in achieving even better accuracies.
References 1. L. Firestone, How emotions guide our lives. Psychology Today (2018). https://www.psycholog ytoday.com/us/blog/compassion-matters/201801/how-emotions-guide-our-lives Accessed 22 April 2020 2. G. Braden, How our emotions affect our health. UPLIFT (2016). https://upliftconnect.com/ emotions-affect-our-health/ Accessed 22 April 2020 3. Tang C, Zhu Q, Huang W, Hong C, Niu X (2020) PLANET: improved convolutional neural networks with image enhancement for image classification. Mathematical Problems in Engineering, (2020) 4. Techopedia, Convolutional neural network (CNN). techopedia (2018). https://www.techopedia. com/definition/32731/convolutional-neural-network-cnn Accessed 23 April 2020 5. Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proceed IEEE 86(11), 2278–2324 (1998) 6. V. Nigam, Understanding neural networks. From neuron to RNN, CNN, and deep learning. Towards data science (2018). https://towardsdatascience.com/understanding-neural-networksfrom-neuron-to-rnn-cnn-and-deep-learning-cd88e90e0a90 Accessed 23 April 2020 7. M. Priya, M. Haritha, S. Jayashree, M. Sathyakala, Smart music player integrating facial emotion recognition. Int Sci Technol J 7, 68–73 (2018) 8. F.F. Kuo, M.F. Chiang, M.K. Shan, S.Y. Lee, Emotion-based music recommendation by association discovery from film music. in Proceedings of the 13th ACM International Conference on Multimedia (2005), pp. 507–510 9. A.S. Dhavalikar, R.K. Kulkarni, Face detection and facial expression recognition system. in 2014 International Conference on Electronics and Communication System, IEEE, (2014), pp. 1–7 10. S. Deebika, K.A. Indira, Jesline, A machine learning based music player by detecting emotions. in: 2019 Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), vol. 1 (IEEE, 2019), pp. 196–200 11. S. Gilda, H. Zafar, C. Soni, K. Waghurdekar, Smart music player integrating facial emotion recognition and music mood recommendation. In: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) (IEEE, 2017) pp. 154–158 12. A.V. Iyer, V. Pasad, S.R. Sankhe, K. Prajapati, Emotion based mood enhancing music recommendation. in 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT) (IEEE, 2017), pp. 1573–1577 13. Z. Hu, E. Lee, Human motion recognition based on improved 3 dimensional convolutional neural network. In: 2019 IEEE International Conference on Computation, Communication and Engineering (ICCCE) (IEEE, 2019), pp. 154–156
Drowsiness Detection System Using KNN and OpenCV Archit Mohanty and Saurabh Bilgaiyan
Abstract As we know, face is a very important aspect to measure or know, if someone is drowsy or feeling sleepy or not, and most important part is the eyes from which we can know the sleepiness of anyone by the way it blinks or duration of blinks. In today’s world road accidents are common and deadly and the main reason for it is the drowsiness of the drivers. There are many methods to detect drowsiness that exist to keep a check on the drivers’ state and make them awake by ringing alarms, if they are not concentrated enough on driving. In this drowsiness detection system, the system detects and measures the drivers’ drowsiness status, such as blinking, i.e., duration of closing of eyes which is Eye Aspect Ratio (EAR), using images from the video and this program makes warning alarms go off for each level of drowsiness when it detects drowsiness in driving. Based on the real-time Vision System, the drivers’ face and eye detection techniques were added, as well as removing lighting effects due to the eye detection false positives, fatigue detection techniques, and learning supervised algorithms to identify tiredness level. In order to remove the lighting effects, the lightness of the images were separated, inverted, and reversed, and then they were composed with LUMA coding and the images converted to grayscale of images. Next the concept of EAR was used to detect drivers’ drowsiness. Finally, the KNN algorithm was used to divide the drivers’ level of drowsiness into three stages as in grouped according to the time of eye remaining closed or the blinking rate, and according to each group differential alarms go off for each stages. Keywords KNN algorithm · Supervised learning algorithms · Eye aspect ratio · LUMA coding
A. Mohanty (B) School of Electronics Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha, India e-mail: [email protected] S. Bilgaiyan School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_38
383
384
A. Mohanty and S. Bilgaiyan
1 Introduction As per WHO, vehicle accidents cause millions of people to hurt themselves and eventually lose their precious lives each year and the main reason for such accidents is sleepy driving or simply carelessness in driving. The AAA has currently reported that 7% of the total accidents occur due to sleepiness and tiredness of the drivers [1]. As driving vehicles for longer duration is quite monotonous and life becomes more boring, the drivers do not drive attentively and due to some climatic factors also the surrounding becomes more sleepy [2]. Because of the sleepy and tiring driving the lives of the innocent people comes in danger and to avoid the same many rules and restrictions are maintained in different countries [3]. Some people believe that the sleepiness can be judged by tracking the duration of the continuous drive. But it has some fall backs as in real time it is very difficult to know whether a driver is sleepy or not as we do not know their situation and the things going in their mind. In today’s world this is one of the major tasks, i.e., to alert the driver when falling asleep. To know about it and to control it, many researchers have undergone many theories and many models to slow the accidents down. Many systems are also incorporated by big companies in their vehicles but the cost of the vehicles end up very high, so normal people cannot afford it. Some techniques were based on the surrounding as climate also plays a role in making someone sleepy but every system has some major drawbacks which needs to be modified [4, 5]. The purpose of this paper is to recognize and categorize the best possible measures, methods, tools for the identification of drivers’ drowsiness. From many years different techniques and tools and methods are used to find drowsiness effectively. It is said that a new and efficient method can be used in which a mask or head gear is required to capture the data of the face and use it to train the model which is also known as PERCLOS. It is the measurement that works as the percentage of total frames on the face in which the eye is closed for certain interval of time. Studies and analysis show that the efficiency of the method is high in normal conditions as it is calculated or measured based on the estimated area of the iris. Now the masks used in this method can also be fabricated which indeed decreases the performance [6]. The paper has been organized into the following sections: Sect. 2 represents the related works and their comparision. Section 3 represents the methodology of the detection system. Section 4 represents the conclusion and future works on this detection system.
2 Related Works Various algorithms and classification techniques that have been used earlier to enhance the drowsiness detection field that was stated above are described below: Ramzan et al. [7] did some analysis on the current techniques used in drowsiness detection and tried to reform their system of detection using classification of the
Drowsiness Detection System Using KNN and OpenCV
385
whole system into 3 categories which are: behavioral, vehicular, and physiologicalbased techniques. He did analysis on various algorithms and processes used. There were many advantages of the system but also with some drawbacks they discussed both aspects. He with his team analyzed the frameworks used for fatigue detection and explained in their paper with full detail. Sava¸s et al. [8] pitched the multi-task ConNN model, in which he classified both mouth and eye’s data simultaneously. As in the name the model is a neural network that is capable of handling multiple tasks. Drivers’ fatigue was measured by a normal technique that is the frequency of yawning and the most known method is PERCLOS which in simple term means the percentage of eye closure. He detected the fatigue by categorizing the data into 3 classes by the neural network model. He used YawdDD and NthuDDD data sets and obtained 98% consistency with the system. You et al. [9] through comparative experimentation, demonstrated that their algorithm outperformed the fatigue detection techniques used today in precision. He proposed a system which could detect tiredness at a faster rate from a mere 640*480 pixels resolution which was incredible. The fps required to detect was also very low i.e. 20fps. With this system he achieved 94.8% precision and could be used by anyone with no special requirements at all. The system was quite useful as it could provide intelligent transit and ensure safety of the drivers. Deng et al. [10] by analyzing the previous algorithms and methods and studying about the drawbacks he made a greenhorn face-tracking system to increase accuracy on tracking and recovering from the drawbacks. He introduced a new method which could be used to detect facial regions by taking 68 points at a time. Then they analyzed the drivers state by using these points and made a data to know about its properties. Thus, by mixing these properties of the eyes as well as mouth, he introduced a system which was named DriCare that can alert the drivers by a warning. He achieved 92% accuracy by this system which lesser amount of drawbacks. Depending upon the literature review, the comparision table for the related works can be seen as follows in Table 1. Table 1 Comparison of the related works Authors
Feature extraction methods, classifiers, and algorithms used
Accuracy percentage (%)
Ramzan et al. [7]
Digital image processing, sensors, 80 drowsiness detection, supervised learning, support vector machine (SVM)
Sava¸s and Becerikli [8] PERCLOS, FOM, Convolutional neural network, Controlled indexing, Non-Controlled Indexing
98.81
You et al. [9]
Controlled indexing, Non-Controlled Indexing, CNN, individual differences, SVM
94.80
Deng and Wu [10]
CNN, fatigue detection, feature location, face tracking
92
386
A. Mohanty and S. Bilgaiyan
3 Methodology The process or the steps of the system is highly important as for future use in which it can be modified as required. Figure 1 shows the state diagram or the core process involved in the detection system which mainly consists of 3 parts: • Eye detection, gray scaling, and lightness processing • EAR calculation based on blinking • Drowsiness driving detection In the overall analysis, a metrics known as EAR is used to calculate the eye aspect and store it in each frame. Successive EAR values are combined and is used for the training of the model based on a dlib facial recognition file. 3 stages are defined under which different alarms of different intensities goes off by the KNN algorithm model. Here in this system the eyes are preferred over other facial features to detect the drowsiness of the driver. Further gray scaling and LUMA coding is also done to increase the accuracy of the model so that the alarms go off in time.
3.1 Eye Detection This step of methodology is the first step in the system as it includes the detection of eyes which is possible after the detection of the face. A facial landmark dlib file was used which tracks the facial region of any kind and helps in detection of eyes. The dlib file is particularly a landmark which detects the total face and very rarely misfires while the face is moving fast or is inconsistent.
Fig. 1 State diagram describing the entire process step by step
Drowsiness Detection System Using KNN and OpenCV
387
3.2 EAR Estimation and Blink Classification As earlier discussed dlib landmark file was used to capture the facial regions, taking the data of whole face could mislead the model and decrease the accuracy. As the head position does not remain still and proper data cannot be taken to train the model, so I took the eye region only as the surface is very less and most of the processing of this system requires the data of the eye rather than the whole face, so the data captured on eyes are taken into consideration for training the model. Here, in this system a metric is used which is known as Eye Aspect Ratio (EAR) in which the height and the width of the eyes are taken in proportion in the landmark of the dlib file. As per the properties of OpenCV the whites of the eyes can be taken into consideration which becomes more complex and time taking but at the same time more accurate in detection. The outside variables of the eyes were considered so that the alarms can go off in time as each second is highly important while the driver is driving as well as tired. The calculation of EAR can be written as eq. 1 [11, 12]. Figure 2. represents eye axis that is used in eq. 1 when the eye is open as well as closed. EAR = (||P2 − P6 || + ||P3 − P5 ||)/ 2∗ ||P1 − P4 ||
(1)
where, P2 , P3 , P5 , P6 are the vertical axis of the eye and P1 , P4 are the horizontal axis of the eye.
Fig. 2 Eyes axis taken into account while calculation EAR for open as well as closed eyes
388
A. Mohanty and S. Bilgaiyan
Fig. 3 Preprocessing of the original image into different formats of image
3.3 Preprocessing The first step of preprocessing is to invert the lightness channel detached from the original image and composed it with the original grayscale image to produce a clear image. The next step is to convert color to grayscale using Luma Coding. There are many different models in Color Space, the LAB color space model is the best way Lightness separation. Median filtering is applied to convert the value of lightness (L) obtained by using the LAB color space to match the actual lighting conditions because it differs from the actual lighting conditions. The process of preprocessing of original image can be seen in Fig. 3.
3.4 Real-Time Drowsiness Detection As we know single alarm alerting the driver is a rare situation. Most of the time it is ignored while driving. Here in this system there are 3 stages of alarm going off at different duration of eye closing time. The alarm tone rises from stage 1 to stage 3 with stage 3 having the highest intensity of tone. Levels named as of 0, 1, and 2, are used to determine whether the user is tired or not as in Fig. 4. Table 2 represents the state of drowsiness in each phase. This made clear that blinks with longer duration are highly concentrated in the first category of drowsiness. If slight drowsiness is detected different alarm goes off at different stages as found out by KNN with specific duration of time. Figure 5 represents the difference in EAR value of eyes while it is open as well as closed. The EAR value drops significantly while the eyes close, i.e., from 348.9 to 130.9.
Drowsiness Detection System Using KNN and OpenCV
389
Fig. 4 Classification criteria for the alarms to go off while drowsiness is detected for each level Table 2 Stages of drowsiness
Drowsiness phase
Drowsiness state
Drowsiness phase 0
Driver is sleeping
Drowsiness phase 1
Driver is feeling sleepy
Drowsiness phase 2
Driver is tired
Fig. 5 Output images with respective EAR values
390
A. Mohanty and S. Bilgaiyan
4 Conclusion and Future Work In this proposed work, we have successfully designed and developed a model for drowsiness or fatigue detection system and with the use of OpenCV library and KNN algorithm of Machine Learning and Shape predictor face landmarks. The system developed was tested successfully, and drastic changes in the EAR was also observed while opening and closing of eyes. The future of this system is to slow down a vehicle as an automatic feature when the fatigue level increases some level, in this case a certain level of drowsiness so that even if the driver isn’t responding to the alarms the vehicle will pull off slowly. Since with this process, the vehicles speed gets controlled significantly, the frequency of road accidents get reduced eventually which is quite a in need to avoid unnecessary loss of life caused by drowsiness-related road accidents.
References 1. B.C. Tefft, Acute sleep deprivation and risk of motorcar crash involvement. Sleep 41(10) 2018. https://doi.org/10.1093/sleep/zsy144. Jun 2016 2. Q. Wang, J. Yang, M. Ren, Y. Zheng, Driver fatigue detection: a survey. in Proceedings of 6th World Congress Intelligence Control Automation, June (2006) 3. X. Sun, H. Zhang, W. Meng, R. Zhang, K. Li, P. Peng, Primary resonance analysis and vibration suppression for the harmonically excited nonlinear system employing a pair of symmetric viscoelastic buffers. Nonlinear Dynamics, vol. 94 (Springer, 2018) 4. Z. Ning, P. Dong, X. Wang, M.S. Obaidat, X. Hu, L. Guo, Y. Guo, J. Huang, B. Hu, Y. Li, When deep reinforcement learning meets 5G vehicular networks: a distributed offloading framework for traffic big data. IEEE Trans. Indus. Info. 16(2), 1352–1361 (2020) 5. F. You, Y.H. Li, L. Huang, K. Chen, R.H. Zhang, Monitoring drivers’ sleepy status at night based on machine vision. Multimedia Tools Appl. Springer 76, 14869–14886 (2017) 6. G. Sikander, S. Anwar, Driver fatigue detection systems: a review. IEEE Trans. Intell. Transport. Syst. 20(6), 2339–2352 (2019) 7. M. Ramzan, H. Khan, S. Awan, A. Ismail, A survey on state-of- the-art drowsiness detection techniques. IEEE Access 7, 61904–61919 (2019) 8. B. Savas, Y. Becerikli, Real time driver fatigue detection system based on multi-task ConNN model. IEEE Access 8, 12491–12498 (2020) 9. F. You, X. Li, Y. Gong, H. Wang, H. Li, A real-time driving drowsiness detection algorithm with individual differences consideration. IEEE Access 7, 179396–179408 (2019) 10. W. Deng, R. Wu, Real-time driver-drowsiness detection system using facial features. IEEE Access 7, 118727–118738 (2019) 11. C.B.S Maior, M.J.C Moura, J.M.M Santana, I.D. Lins, Real-time classification for autonomous drowsiness detection using eye aspect ratio. Expert Syst. Appl. Elsevier, 158 (2020) 12. S. Mehta, S. Dadhich, S. Gumber, A.J. Bhatt, Real-time driver drowsiness detection system using eye aspect ratio and eye closure ratio. in Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), (2019)
Optimized Dynamic Load Balancing in Cloud Environment Using B+ Tree S. K. Prashanth and D. Raman
Abstract In cloud computing, resource utilization is the most important criterion to resolve task scheduling. Day by day users is increasing with their request for services as the popular feature pay-per-use in the cloud. Cloud servers are in demand as tasks of the user requests are more. Cloud must dynamically schedule the requested task to the best resources based on the load. While the allocation of resources the task should consider various parameters like time, cost, power, reliability, and availability. Many tasks scheduling algorithms not concentrated on allocation, reallocation, and deletion or removal of servers with respect to the load which optimizes the power utilization, reduce the waiting period of the task, and reduce its cost as a need to the IT-organization. The proposed method uses the B + Tree algorithm, to allocate or release the servers dynamically as the task arrives. The algorithm takes the input as execution time, transmission time, round trip time of tasks, and virtual machine resource capacity. The result shows that cloud providers, provide recourses as per the IT-industry financial constraints, which optimize the utilization of recourse using this proposed method. Keywords Load balancing · Cloud · B+ tree · Virtual machine · Task scheduling
1 Introduction Cloud computing consists of a large amount of pool of recourse that share through the internet. The Internet is a backbone of the cloud and has inherited features of services as “XaaS,” which can be requested and released. These services are recourses like network, servers, memory, application, etc. These resources are shareable via the internet. The services are accomplished by the cloud model. The model consists of S. K. Prashanth (B) · D. Raman Department of Computer Science and Engineering, Vardhaman College of Engineering, Hyderabad, India e-mail: [email protected] D. Raman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_39
391
392
S. K. Prashanth and D. Raman
Software as a Service “SaaS,” where applications are used; Platform as a service “PaaS.” where Application can be developed by requesting programming language and operating system and infrastructure as a service “IaaS,” where CPU, memory, etc., are used. In cloud computing many problems exist, one of the major problems is task scheduling. The task scheduling main job is to allocate available recourses to the virtual machines present in cloud servers. Many algorithms are proposed taking into different parameters like reliability, availability time, etc. However, none of them concentrated on server utilization optimally with respect to load with cost on it. So load balancing must be taken into consideration by taking the tasks or copying into another server with the same requirement or configuration. The task scheduler must use the available resources effectually and efficiently without losing cloud environment performance. The challenging issue in the task scheduler is to map the task with resources optimally. An Optimal search for resource allocation based is on service level agreement (SLA), where cost or power utilization is the main parameter in the task scheduler algorithm [1, 2]. Since the task scheduler comes under the NPcomplete problem, so it calls for other parameters also as execution time, response time from the user side, and resource utilization on the cloud provider side to solve the problem [3]. The proposed method is to implement the task scheduling algorithm in the cloud-based on B + Tree algorithm for allocating, reallocation, and removing of servers to improve task completion time, decrease cost and power with optimal resource utilization. In this paper, we proposed a novel approach to the Dynamic load balancing algorithm to relate to the cloud environment. We have presented the B + Tree algorithm for load balancing the tasks. The B + tree algorithm searches fast with a compatible virtual machine (VM) for allocating and reallocating the tasks when overloaded and deleting VM with no load or under-load on the servers efficiently because of its indexing factor, where time complexity is less when compared to other searching algorithms. Various performance parameters are taken into consideration and compared to those with different research algorithms on load balancing in the cloud [4, 5].
2 Related Work In this cloud computing era, number of user’s or clients increased drastically. As clients increase obviously request or task also increases. The issue lies is to reduce the waiting period of task to allocate virtual machine in the server. The task scheduler to is consider various factors (i.e., execution time, resource utilization, power consumption and cost). In Xue and Wu [6] has presented a Genetic Hybrid Particle Swarm Optimization (GHPSO) as task scheduler where mutation and crossover in Genetic algorithm embedded into PSO. The result shows GHPSO is better in minimizing the cost for within given execution time. GE Junwei has proposed a novel static genetic algorithm by taking into complete completion time of task and its cost constraint. Ravichandran and Naganathan [7] have presented a solution for dynamic task arriving at same time, by keeping all arriving tasks in a queue. The first task in the
Optimized Dynamic load balancing in cloud Environment …
393
queue is allotted to appropriate virtual machine (server). The result shows optimum resource utilization with reduce in execution time. Varalakshmi et al. [8] has proposed a solution based on the requirements of user-preferred Quality of Service (QoS) parameters. The work concentrated on task Scheduling algorithm in cloud environment, which improves CPU utilization. Selvarani and Sadhasivam [9] has proposed cost-based scheduling algorithm which maps task with virtual machine. The algorithm maps the task to the better or optimized resources, which increases computation ratio by making groups of similar configuration tasks with respect to cloud resources and sends the grouped tasks as jobs to the resources. Safwat et al. [10] proposed a novel approach a TS-GA (Tournament selection–Generic Algorithm).They used crossover and mutation method to reduce the execution time and cost of tasks, and maximum resource utilization. The result shows that TS-GA is better performance than Round Robin and Generic Algorithm. Awad et al. [11] proposed a task, Load Balancing Mutation (balancing) a particle swarm optimization (LBMPSO). This algorithm mainly concentrated on reliability in cloud environment and reallocation of failure tasks. The result shows that LBMPSO is better than ET longest random algorithm, MPSO, SPSO.
3 System Model (i) Resource Model It projects the processing capability of the virtual machine (node). The resource availability of the virtual machine is the computing capacity, storage, and network transmission capacity in a time unit [12]. (a) Node Resource: The resource available in a unit of time. The resource of Node i is represented with vector NRi .
N Ri =
N Rcpu ; N Rmem ; N Rdisk ; N Rnet
(1)
(ii) Task Model It projects the consumption of the resources by the task. A task’s needed for the execution. Depends on the availability in the virtual machine (node). (b) Task Description: The resource needed for the execution the task. The task resource of Node i is represented with vector TRi . T Ri = T Rcpui ; T Rmemi ; T Rdiski ; T Rneti
(2)
394
S. K. Prashanth and D. Raman
(iii) Execution Time It is the time taken by task i on the virtual machine j. Let us assume task i utilized complete resources of virtual machine j, then the time taken to execute the task i on virtual machine j is represented by E ij . Ei j =
T Ri N Rj
(3)
(iv) Virtual Machine Load (node) The load of the node i at the time t is dented by. Lt =
Vi · length t
(4)
(v) Cluster VM load (server) The average load of all nodes is denoted by L t at the time t. Lt =
Lt n
(5)
4 Mapping Task to Virtual Machine Many algorithms are used to assign appropriate tasks to a virtual machine (session 3). They consider reliability, execution time, Round trip time. In our proposed work we concentrated on power utilization, cost, and waiting time of tasks that are supposed to allocate to the virtual machine. We consider two cases, in case (i) Most of the time dynamically more task arrive at peak time, at that movement more resources are needed [13, 14], we can’t make the incoming task to wait till the virtual machine is freed from already busy running task in virtual machines. The maximum time utilized by a cluster running all tasks is called makespan (see eq. 6). Kumar et al. proposed an object detection method for blind people to locate objects from a scene. They have used machine learning-based methods along with a single SSMD detector algorithm to develop the model [15–17]. M Smax = Max
E j ×β
(6)
(ii) And less peak time few tasks should allocate to the fewer virtual machine (see eq. 7) [10, 11].
Optimized Dynamic load balancing in cloud Environment … Ti
Ti+1
Ti+2
Task Buffer :
395 ...
Cloud Service
Virtual Machines (VM)(Resourc
T1,T2 Ti(Details)
B
+
Tree Task Scheduler
Allocation, Reallocation and removing of servers on
mapping task with VM Fig. 1 Flowchart of task scheduler
MSmin = Min
Ej × α
(7)
The α, β are positive coefficient and α + β = 1. For both the case allocation (reallocation) and de-allocation (removing) of servers a mechanism is needed uniformly to save power and cost. The power utilization in servers proportional to cost, if power utilization is less than the cost also reduced. So dynamically, servers are allocated at peak time and removing of servers at a non—peak time (Fig. 1). (i) Allocation of server: when the incoming task seeking for the virtual machine in the server for allocation, the following cases to consider Case 1: If a virtual machine in a server with no task, then the task is allocated to it. Case 2: If tasks are filled with all virtual machine, and the load of the cluster (calculated in the equation) is less than MS min, i.e. (Lt < MSmin) then the incoming task will wait till virtual machines are free, otherwise a new server with new virtual machines is allotted. ii) Removing of servers: When no more incoming task is available in task queue and the task on the server is about to complete their execution then following cases to consider, Case 1: If no task left in the server for execution i.e. load is zero(0), the server is removed.
396
S. K. Prashanth and D. Raman
Case 2: if task exists in current servers and the load is greater then MSmax (i.e. Lt > MSmax), assign them to another server’s virtual machine’s and free(remove) the existing server.
5 B + Tree Algorithm The Working Principle of B + tree: The N-ary tree with several children per node (We Refer as Server or cluster) is called as B + Tree. To perform searches, insertion, and deletion operation B + tree consists of a leaf, internal, and root nodes (see Fig. 2). It is more efficient than B-Tree. The internal node act as an index for fast accessing the data. All data are stored at a leaf node that are interlinked with each other. B + Tree is created based on virtual Machine id (“VMid”), the key in B + Tree is “VMid.” Step 1: Sort the task stored in the task buffer in ascending order according to their requirement of resources (see Eq. 2). Sort the virtual machine in the cluster (see eq. 3) i.e. E 11 < E 22 < . . . E i j . Virtual machine is identified by its “id” and its parameters and the details are given in the eq. 1. Step 2: Initially allocate the tasks to the virtual machines by one-to-one mapping to form a cluster. The new cluster act as a root shown in Fig. 3. Each association of tasks with VM is referred to as a node. The pointer pointing to the Sub-Tree based on virtual machine Id(VM id). Step 3: The rate of incoming task arrives if more it checks the availability of the resource (VM) in the cluster. If the load of the cluster is less than α, (i.e., Lt < α), it waits otherwise, allocation takes place by creating a new cluster for the arrived task, i.e., splitting take place in B + .Tree. In Fig. 4a, task Internal Data(index P K
P K
P K
P K
P
P
P
K P K
P
DATA(leaf) D1
D2
D= Data
D1 D2
D3 P= Pointer
D3
K= Key
Fig. 2 B + Tree structure (cluster/server)
Fig. 3 cluster sorted: task maps virtual machine
Task VMid
…. Task VMid
Optimized Dynamic load balancing in cloud Environment …
a
397
Ti
Ti-1
Task waits
Ti-2 Ti-3
TX VMidx
….
Ty VMidy
b TX VMidx
… Ty . VMidy
T i-3 VMida
Ti-2 VMidb
Fig. 4 a Task waits: load is(L t < MS min ), b split: new cluster created and linked
waits, since the load is less and in Fig. 4b shows a news cluster (server) is created. Because of this, no need for the long wait of the task, and all tasks are treated as an equal priority in the B + Tree. Step 4: Task completed in the virtual machine should be freed or removed and If the load of the cluster is less than β (i.e., Lt < MS max ) then assign the running task in the current cluster to its neighbor cluster based on B + Tee (Merging concept) and free or remove the current cluster So that we can save the power and cost in a cloud environment. Figure 5a shows the current cluster load is less than β (i.e., T i-2 ) and in Fig. 5b current cluster’s node task is assigned to its neighbor cluster’s node.
a TX VMidx
….
Ty VMidx
….
b TX VMidx
….
Ty VMidx
….
Fig. 5 a load of cluster before Assignment, b current cluster freed after task assigned to neighbor cluster
398
S. K. Prashanth and D. Raman
Step 5: Repeat the steps from 2 to 4 till all the task in the task buffer is empty.
6 B + Tree Application in Cloud Environment The objective of this paper is to optimize to map dynamic tasks with a virtual machine by Considering the searching time, waiting time, cost, and power utilization. Let VM be the set of the virtual machine in the cluster(server) and the T be the set of tasks to be scheduled. The searching and processing of tasks in an appropriate virtual machine should be considered to be optimal and benefit the client or organization. To solve this problem in a reasonable time B + Tree indexing algorithm is used. Indexes are used to search the data with faster data access. According to the SLA, the α and β values are two thresholds, chosen as a parameter to control the load of the cluster applied on B + Tree. The MATLAB is used to implement B + Tree based on the Eqs. (1–7) to find the optimal mapping of the task to a virtual machine in a cluster. All the mapping of tasks with the virtual machine is present at the external node and Virtual machine Id are used in the internal node in B + Tree. The α value is used to control the waiting period of the arriving task with respect load in the cluster and β is to control the task executing in the virtual machine with respect to load in the cluster. We will consider α and β values for three cases for testing task scheduling as per the IT-Organization funds available. The α and β values for case 1 are 98% and 20%, for case 2 are 95% and 15%, and for the case 3 as 90% and 10%, respectively, (see Eq. 5 and Sect. 5). Initially, the allocation of virtual machine runs in the cluster, As allocation increases load also increase, remaining task waits till L > MS min . A waiting task allocates to a new cluster only when L < MS min . If the incoming tasks decrease, the load in the cluster gradually also decreases, the current cluster load becomes less than MS max (LMS max ). At that movement, all tasks running in the current virtual machine are assigned to its neighbor’s cluster and the current cluster is freed. Scenario 1: An IT-organization conducted an audit and decided to reduce the cost by lowering the funds by applying values of α = 0.98, β = 0.02. The proposed method shows resource utilization, i.e., the number of servers/clusters needed to complete the execution of the task. In Fig. 6a shows that recourse utilization is less, but the waiting time of the tasks to map the virtual machine increased. Scenario 2: Funds are no gain, no loss so selected α = 0.95, β = 0.05 to balance on resource utilization and waiting time is shown in the Fig. 6b. Scenario 3: An IT-organization has more funds, so selected the values of α = 0.9, β = 0.1. The number of servers/clusters needed to complete the execution of the task is shown in Fig. 6c. In this, the recourse utilization is more and the waiting time of the task to map virtual machine decreased. The cost associated with all the three scenarios are shown in the Fig. 7 and Table 1.
Optimized Dynamic load balancing in cloud Environment …
a
300 200 100
Number of servers
0 200 600 1000 1400 1800 2200 2600 3000 3400 3800 4200 4600 5000 5400
Fig. 6 a Number of clusters needed (α = 0.98, β = 0.02) b Number of clusters needed (α = 0.95, β = 0.05) c Number of clusters needed (α = 0.9, β = 0.1)
399
makespan (ms)
b 200 100
5000 5400
4600
3800
4200
3400
2600 3000
2200
1800
1400
1000
600
200
0
makespan (ms)
Number of servers
c
150 100
50 0
makespace (ms) Fig. 7 Plot of cost, clusters, time according to the scenarios 1, 2, and 3
50 40 30 20 10 0
servers makespace β= 0.02 β= 0.05 β= 0.10
cost
α = 0.98 α = 0.95 α = 0.90 Table 1 Efficiency of Cost,time and cluster values according to the scenarios 1, 2, and 3
α = 0.98
α = 0.95
α = 0.90
β = 0.02
β = 0.05
β = 0.10
Number of servers
44
30
24
Time (Minutes)
12
10
9
Cost in Thousand
6K
8K
14 K
400
S. K. Prashanth and D. Raman
7 Conclusion The organization, moving towards the cloud environment due to its elasticity in the resource management system. However, the task’s utilization in the cloud is not uniform. The compatibility between task mappings to virtual Machines is not appropriate when considering cost, waiting for tasks, execution time, etc. This Paper focused on the B + Tree algorithm to optimally use the resources concerning the ITindustry. The result showed that resource allocation can be tailored, to the required cost-effective based on IT-organization needs. In addition to the cost and waiting time of the task, the proposed method offers the elasticity in resources needed by the IT-organization based on their financial constraints. However, parameter values can be changed when financial considerations are not feasible.
References 1. R. Kaur, S. Kinger, Enhanced genetic algorithm based task scheduling in cloud computing. Int. J. Comput. Appl. 101, (2014) 2. S. Kumar, P. Balasubramanie, Dynamic scheduling for cloud reliability using transportation problem. J. Comput. Sci. 8, 1615–1626 (2012) 3. J.W. Ge, Y.S. Yuan, Research of cloud computing task scheduling algorithm based on improved genetic algorithm. Appl. Mech. Mater. 2426–2429 (2013) 4. R. Kapur, August. A workload balanced approach for resource scheduling in cloud computing. in Eighth IEEE International Conference on Contemporary Computing (IC3) August, (2015), pp. 36–41 5. K. Li, G. Xu, G. Zhao, Y. Dong, D. Wang, Cloud task scheduling based on load alancing ant colony optimization. in Sixth Annual China grid Conference, Liaoning, (2011) pp. 3–9 6. S. Jun Xue, W. Wu, Scheduling workflow in cloud computing based on hybrid particle swarm algorithm. TELKOMNIKA Indones. J. Electri. Eng. (10:7), 1560–1566, (2012) 7. S. Ravichandran, D.E. Naganathan, Dynamic scheduling of data using genetic algorithm in cloud computing. Int. J. Comput. Algo. 2, 127–133 (2013) 8. P. Varalakshmi, A. Ramaswamy, A. Balasubramanian, P. Vijaykumar, An optimal workflow based scheduling and resource allocation in cloud. in Advances in Computing and Communications, First International Conference, ACC (2011), pp. 411–420 9. S. Selvarani, G. Sadhasivam, Improved cost-based algorithm for task scheduling in cloud computing. in IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (2010) 10. A. Safwat, A. Hamad, F.A. Omara, Genetic-based task scheduling algorithm in cloud computing environment. (IJACSA) Int. J. Adv. Comput. Sci. Appl. 7(4), (2016) 11. A.I. Awad, N.A. El-Hefnawy, H.M. Abdel_kader, Enhanced particle swarm otimization for task scheduling in cloud computing environments. in International Conference on Communication, Management and Information Technology (ICCMIT 2015), Procedia Computer Science, vol. 65 (2015), pp. 920–929 12. R. Panwar, B. Mallick, Load balancing in cloud computing using dynamic load management algorithm. in IEEE International Conference on Green Computing and Internet of Things (ICGCIoT) October (2015), pp. 773–778 13. S.K. Mishra, B. Sahoo, P.P. Parida, Load balancing in cloud computing: a big picture. J. King Saud Univer.–Comput. Info. Sci. Comput. Info. Sci. 32, 149–158 (2020)
Optimized Dynamic load balancing in cloud Environment …
401
14. M. Ajit, G. Vidya, VM level load balancing in cloud environment’. in IEEE Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), (2013), pp. 1–5 15. A. Kumar, Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. Int. J. Cloud Appl. Comput. (IJCAC) 9(3), 22–36 (2019) 16. A. Kumar, S. Srivastava, Object detection system based on convolution neural networks using single shot multi-box detector. Proc. Comput. Sci. 171, 2610–2617 (2020) 17. A. Kumar, S.S.S.S. Reddy, V. Kulkarni, An object detection technique for blind people in realtime using deep neural network. in 2019 Fifth International Conference on Image Information Processing (ICIIP), (Shimla, India, 2019), pp. 292–297
Computer Vision-Based Wheat Grading and Breed Classification System: A Design Approach Atharva Karwande, Pranesh Kulkarni, Pradyumna Marathe, Tejas Kolhe, Medha Wyawahare, and Pooja Kulkarni
Abstract In terms of the world’s total production of food grains, wheat is the most important and highly productive cereal grain. It is a highly cultivated crop in India as well as in other countries of the world. One of the most important problems that need to be tackled is the quality assessment of the wheat produced. There are some physical as well as chemical processes for quality assessment, we can implement computer vision techniques for the physical process of assessment. So we are creating the dataset specifically for the computer vision process. Some of the guidelines for quality assessment have been already given by the Food Corporation of India. In this paper, we present a method of creating the dataset of wheat grains which will be useful for quality assessment and breed classification. This dataset contains the images of grains of different breeds and grade patterns. Keywords Area thresholding · Digital image processing · Foreground detection · Image acquisition · Thresholding
A. Karwande (B) · P. Kulkarni · P. Marathe · T. Kolhe · M. Wyawahare · P. Kulkarni Department of Electronics and Telecommunication, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] P. Kulkarni e-mail: [email protected] P. Marathe e-mail: [email protected] T. Kolhe e-mail: [email protected] M. Wyawahare e-mail: [email protected] P. Kulkarni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_40
403
404
A. Karwande et al.
1 Introduction Wheat is a widely cultivated crop in the world which is an important source of protein in human food. As it can be produced efficiently on a large scale, it is the second largest produced cereal crop in 2019 [1]. According to the record of the food and agriculture organization of the united nation, worldwide total production in 2017 is approximately 772 million tonnes [1]. In 2017 China, India and Russia had produced almost 41% of global production [1] and the production in Europe was 150.2 million tonnes makes it the largest wheat producer [2]. India is the third largest country in terms of global wheat production. India has produced approximately 98.5 million tonnes in 2017 [3]. According to Agricultural and Processed Food Products Export Development Authority, India has exported 2,26,225.00 MT in the year of 2018-19 which costs for 60.55 million US dollars [3]. So there is a need for quality testing of wheat produced. Food Corporation India has specified some rules and regulations for the quality testing [4]. There are some chemical as well as physical processes for the qualitative analysis. Due to some limitations, physical processes are mainly preferred. But it needs a lot of time and labor skills. There is a need for automation. Precise and efficient deep learning and statistical machine learning techniques can help to solve this problem. It may save the time of testing as well as human efforts. So for training a highly precise model, the main requirement is a large and good quality dataset. In this paper, we have described the process to create the dataset of wheat grains which is important for breed and grade classification. There are some important benefits of grading like customer gets effective price according to the grade and farmers also get the knowledge of true price and correct markets to sell their products. Food Corporation of India has specified some rules and regulations about quality evaluation. The different factors affecting the quality are impurities, damaged grains, bushel and kernel weight, nature and structure of the kernel, and moisture content [4]. Ministry of Agriculture, Government of India has defined some grade specifications called National Grade Standards [4]. There are some defined characteristics to determine the grade such as foreign matter which can be defined as non-food items or impurities consisting of samples, other food grains which include food grains other than wheat, wheat grains but of different varieties or species, damaged grains which may include the grains that are damaged internally or discolored grains, slightly damaged grains or the grains which are superficially damaged or discolored, immature, shriveled, and broken grains and weevilled grains or the grains eaten by insects. According to the presence of these grains in the sample, the grade is determined. In this research, we have created the dataset containing the grains having different quality and different species. As there are 30,000 different varieties of 14 species of wheat in the world it is an important task to identify the breed or variety [1]. In India, the majorly cultivated species of wheat are Emmer, Macaroni wheat, Common bread wheat, Indian dwarf wheat [5]. This research was carried out in the state of Maharashtra in India. In this region commonly available breeds or varieties are Bansi, Lokwan (Triticum Gramineae), Madhya Pradesh Lokwan (MP-Lokwan), Emmer (Triticum Dicoccon), and Sehori (Triticum Aestivum). Lokwan and MP-Lokwan are mainly cultivated in
Computer Vision-Based Wheat Grading and Breed …
405
state Madhya Pradesh in India. While Sihor is mostly produced in the Gujrat region in India. The physical appearance of the grains of these varieties is different concerning color, shape, and size. So we can design a computer vision-based system to automate the process of breed classification. By applying some digital image processing techniques, cutouts of grains in sample images were obtained. Then with some specified annotation rules, the cutouts are labeled. This dataset creation method and annotation rules are explained in this paper. The main objective of this research is to remove noise and background content from the raw images of wheat samples, to extract the useful information from the raw images and to create the dataset of high quality and low entropy images of kernels for qualitative analysis and breed classification.
2 Literature Review Previously, David, et al. published the dataset titled “Global Wheat Head Detection” which contains RGB images of wheat heads [6]. This research aimed to develop an algorithm for wheat head detection. Total 4700 high-resolution images and around 190,000 heads at various growth stages and of various genotypes were collected from the different regions across the world. The important feature that makes this dataset better from previous research was a wide range of genotypes, geographic areas, and observational conditions. Wheat heads in different stages like post-flowering, ripening, flowering, etc., were included in the dataset. Also, the Dataset includes the heads with variation in row spacing as well as sowing density. Similar research was carried out by Md Mehedi Hasan et al. for the detection and analysis of wheat spikes in South Australia [7]. This dataset was created with the images captured in the actual field rather than laboratory-controlled environments. The main aim of the research was to develop a system to detect wheat spikes from the images of the field. A land-based vehicle equipped with a high-resolution camera with fixed settings is used to capture the images over the entire field. A total of 90 plots at the same geographical location in South Australia were analyzed to create the dataset. Images were captured in three different situations. The dataset includes about 300 images of 10 different varieties at different growth stages. Małgorzata Charytanowicz et al. proposed the research on the gradient clustering algorithm for features analysis of X-ray images for the classification of the variety of wheat grains [8]. Through this research, the author created the dataset for the classification of wheat varieties Kama, Rosa, and Canadian. Around 70 high-quality x-ray images of each of these varieties were used to create this dataset. It includes seven parameters of kernel such as area, perimeter, compactness, length of the kernel, the width of the kernel, asymmetry coefficient, and length of kernel groove. Important features were extracted by the principal component analysis algorithm. The research on the detection and enumeration of wheat grains was carried out by Wu Wei et al. [9]. In this research, the author proposed the deep learning-based solution to count the number of grains in the sample. The dataset created in this research includes 1748 images with three different varieties that were captured on six different backgrounds and with two
406
A. Karwande et al.
image acquisition devices. The author maintained variation in background, cameras as well as depth and camera angle. A total of 29,175 grains were obtained from 1748 sample images. The author used a tool called LabelImg for annotation of the dataset. This dataset is mainly designed for Detection and enumeration purposes. This technique is robust and efficient for wheat grain detection and can also be used for other applications. Since these datasets are unable to solve the problem of breed and grade classification problems efficiently. This work focuses on the technique to create the dataset of wheat kernels which can be used for breed classification as well as qualitative analysis of wheat.
3 Methodology We have followed these steps for acquiring the data and generating the dataset maintaining the quality of images of wheat kernels.
3.1 Sample Creation Wheat samples collected for the research contains mixed quality grains such as broken, damaged, chewed grains, etc., ensures that all types of kernels are included in the dataset. About 250–300 g of five aforementioned breeds are collected for this study and is enough to create the dataset. Wheat samples required for creating the dataset are created in such a manner that all the possible combinations of kernels should be included in the dataset. Table 1 shows the number of grains in the first sample and total samples created for the research across each breed. The first sample number of broken grains and the percentage of broken grains is zero. Then a few broken grains are added after recording every sample image so that the percentage of broken grains increases gradually. Every last sample of each breed contains the highest percentage of broken grains. Table 1 Wheat sample collected for dataset creation Breed
No. of grains in first sample (Min)
No. of grains in last sample (Max)
Total samples acquired
Bansi
120
150
31
Emer
100
129
30
Lokvan
140
171
32
MP-Lokvan
130
165
36
Sihor
110
139
30
Computer Vision-Based Wheat Grading and Breed …
407
3.2 Image Acquisition While capturing images of samples, different background colors are experimented to verify the robustness of the image processing algorithm. Yellow and amber color is strictly prohibited as it matches with the color of wheat grains. The fabric used as a background should be plain and clear it should not contain any kind of texture or folds. We have used an android phone camera with 12 Megapixel resolution (dimension of images acquired is 3024 × 4032 pixels) and the algorithm often works with low-resolution cameras. The recommended pixel resolution of the camera should be from 6 to 16 Megapixels. Because for high-resolution images time required to process them is too high. So it is important to optimize both processing time as well as the quality of the image.
3.3 Image Preprocessing First, we have converted the RGB image to CIE XYZ color space and subsequently converted it into grayscale format. RGB image can be converted to XYZ by using Eq. (1) [10]. ⎛
⎞ ⎡ ⎤⎛ ⎞ X 0.489989 0.310008 0.2 R ⎝ Y ⎠ = ⎣ 0.176962 0.81240 0.010 ⎦⎝ G ⎠ Z 0 0.01 0.99 B
(1)
Generally, in image processing and image recognition systems, color images are converted to grayscale format to reduce the computational complexity of the image processing algorithm [11]. This operation of converting XYZ to a grayscale image can be performed by using Eq. (2) [12]. Gray(i, j) = 0.299 × X (i, j) + 0.587 × Y (i, j) + 0.114 × Z (i, j)
(2)
where Gray(i, j) signifies the pixel value of a grayscale pictures in the ith row and the jth column. Otsu’s thresholding method is applied for image binarization on the grayscale image. It is an adaptive thresholding technique used to select optimal threshold value from all possible values for image segmentation [13]. The optimal threshold value can be selected by using the following algorithm. Algorithm 1 Algorithm for optimal threshold detection 1. Compute histogram and probabilities of each intensity level. 2. Initialize ωi (0) and μi (0). 3. Iterate over all possible thresholds from t ∈ [0, max(intensity)]
408
A. Karwande et al.
a. Update ωi and μi b. Compute σb2 (t) 4. Threshold = max(σb2 (t)) For removing salt and pepper noise, the median filter of kernel size 5 × 5 pixels is used. Then boundary labels and region properties of the foreground are obtained. Light specs from obtained continuities are removed by area thresholding of threshold 225. We can get a colored image of the foreground containing grains of wheat after elementwise multiplication of filtered image with the original color image. Figure 1 represents the conversion of a raw image into different color spaces and it shows background removal and noise reduction from the original raw image. In this way, we have generated all the images of wheat kernels present in the sample and resized all the images of the kernel to size 256 × 256 pixels. Images generated now can be readily used for grade analysis or breed classification.
Fig. 1 Images obtained after every processing step of raw image
Computer Vision-Based Wheat Grading and Breed …
409
3.4 Labeling For breed classification we don’t need to label the dataset manually, we can annotate the images of grains according to the breed of the sample. But for grade analysis, we have specified some guidelines for labeling the dataset and every kernel should be analyzed to obtain labels. Food Corporation of India has specified some rules for wheat grading that we have already discussed earlier. We can decide the grade of the kernel according to the physical appearance of the grain. It may be according to the color or shape of the grain. We have considered some of them for labeling the dataset are full or best quality grains, Slightly Damaged grains, Damaged grains, Weevilled grains, Broken grains. The definitions of these characteristics are mentioned in Sect. 1. Best quality grain can be defined as the grain not pursuing any of the other four characteristics. Figure 2 shows quintessential cutouts labeled from each breed corresponding to five characteristics mentioned above. Each grain may pursue one or more than one characteristic mentioned above. So each grain should be annotated with the respective characteristics.
Fig. 2 Sample cutouts in generated dataset corresponding to each breed and characteristics of grains
410
A. Karwande et al.
4 Results After processing all 159 raw images of the sample, a total of 20,667 images of wheat kernels are generated. These images of kernels are generated from high-quality colored images of samples created for the dataset. The background of these generated images is black irrespective of the background of the original images and it doesn’t contain any noise as the noise present in original raw images was removed in the preprocessing step. These images in the generated dataset have high quality and low entropy so it can be readily used to develop deep learning models. We have observed the variation in the number of grains across different breeds because of the variation in the number of samples collected to create the dataset. Figure 3 shows the distribution of wheat grains across each breed. We can see the variation in the number of grains across different breeds. This is because of the number of samples collected to create the dataset and the number of grains in the first sample. Figure 4 describes the distribution of grains pursuing the characteristics which define the grade of the kernel. We can observe that there is a skew in the dataset no. of slightly damaged grains and the number of best quality grains is greater than that of the other types of grains. It is one of the common problems in dataset creation but there are some techniques such as oversampling and undersampling that are useful to tackle such problems [14]. Oversampling is randomly creating duplicate data points in the minority class in the dataset and undersampling is the method in which some of the data points from majority class are deleted to balance the dataset. Data augmentation is the best technique to create duplicate items without introducing any bias [15]. The image preprocessing algorithm is very optimized and robust that is we can apply this method to any of the raw images of wheat samples to generate the images of kernels required for deep learning algorithms. The image processing algorithm is able to filter
Fig. 3 Distribution of wheat grains across different varieties of wheat
Computer Vision-Based Wheat Grading and Breed …
411
Fig. 4 Distribution of wheat grains across according to the characteristics
out the background, noise, or light specs present in the raw images. Several datasets have already been proposed for quality testing and breed classification purposes. But they were not able to tackle the problem neither useful to achieve a higher accuracy to deploy the deep learning or machine learning model. The classification problem can be solved by using statistical machine learning methods as well as deep learning. We propose to use convolutional neural networks for better accuracy and higher precision. We also propose to use some different activation functions and strategies to train the model for breed classification to achieve better results. Minimum 3–4 sets of convolution layers and pooling layers should be used to design the classification model. Due to the high imbalance in the dataset, the same strategy cannot be implemented for grade identification. To achieve the better result, we propose to use sigmoid activation function and individual loss against each character can be calculated and the weighted sum of all the losses should be backpropagated to train the model. F1 score will be the good matrix to find the model score as it is the harmonic mean of the precision and recall. In this research, most of the disadvantages in existing datasets were removed so that it can be readily used to deploy the solution for the wheat qualitative analysis problem. Table 2 describes some advantages and highlights of the dataset proposed in this research.
412
A. Karwande et al.
Table 2 Comparison between proposed dataset and existing datasets Property
Proposed dataset
Existing dataset
High-quality colored images
✔
✔
Low entropy images
✔
✘
Breed Invariant (for qualitative analysis)
✔
✘
Includes grain characteristics to determine grade are according to Food Corporation of India
✔
✘
Includes images having different Background colors
✔
✔
Includes foreign impurities
✘
✔
5 Conclusion This research concludes that foreground detection and image segmentation methods for generating images of wheat kernels are useful for breed classification and qualitative analysis of wheat. The proposed dataset includes most of the characteristics of grains required to determine the grade of the sample which are predefined by the Food Corporation of India. The dataset consisting of high quality and low entropy images is highly extensive and can be readily used to deploy deep learning models for solving the problem of breed classification and grade analysis. However, for grade analysis, the dataset is imbalanced so to tackle this problem more work is needed. Some extra grains resembling the respective characteristics should be added to the dataset. In this research, we have described some guidelines and methods for sample creation, image acquisition, and labeling the dataset. These guidelines will help the community to contribute to extending this dataset. This research can be extended by training the neural network model to solve the classification and quality analysis problem. One can also extend the dataset by collecting more samples and by including the samples of other wheat breed.
References 1. Wheat, Wikipedia (2020). https://en.wikipedia.org/wiki/Wheat. Accessed 26 Jun 2020 2. Crops, FAOSTAT (2020). http://www.fao.org/faostat/en/. Accessed 26 Jun 2020 3. Wheat, APEDA (2020). http://apeda.gov.in/apedawebsite/SubHead_Products/Wheat.htm. Accessed 28 Jun 2020 4. Post Harvest Profile of Wheat, AgMarknet (2020). https://agmarknet.gov.in/Others/profile_w heat.pdf. Accessed 24 Jun 2020 5. Classification of Indian Wheats,. Agropedia (2020). http://agropedia.iitk.ac.in/content/classi fication-indian-wheats. Accessed 29 Jun 2020 6. E. David et al., Global wheat head detection (GWHD) dataset: a large and diverse dataset of high resolution. RGB labelled images to develop and benchmark wheat head detection methods (2020). https://arxiv.org/abs/2005.02162 7. M.M. Hasan et al., Detection and analysis of wheat spikes using convolutional neural networks. Plant Methods. 14 (2018). https://doi.org/10.1186/s13007-018-0366-8
Computer Vision-Based Wheat Grading and Breed …
413
8. M. Charytanowicz et al., Complete gradient clustering algorithm for features analysis of X-ray images. in Advances in Intelligent and Soft Computing Information Technologies in Biomedicine, (2010), pp. 15–24. https://doi.org/10.1007/978-3-642-13105-9_2. ISBN 978-3-642-13105-9 9. W. Wei et al., Detection and enumeration of wheat grains based on a deep learning method under various scenarios and scales. J. Integr. Agricul. 19, 1998–2008 (2020). https://doi.org/ 10.1016/s2095-3119(19)62803-0. ISSN 2095-3119 10. S.N. Gowda, C. Yuan, in ColorNet: Investigating the Importance of Color Spaces for Image Classification. Computer Vision–ACCV 2018 Lecture notes in computer science. (2019), pp. 581–596 11. C. Kanan, G.W. Cottrell, Color-to-grayscale: does the method matter in image recognition?. PLoS ONE (2012). https://doi.org/10.1371/journal.pone.0029740 12. Q. Lei et al., in GreyReID: A Two-stream Deep Framework with RGB-grey Information for Person Re-identificatio n (2019). ArXiv abs/1908.05142 13. N. Otsu, A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979). https://doi.org/10.1109/tsmc.1979.4310076 14. J. Hernandez et al., in An Empirical Study of Oversampling and Undersampling for Instance Selection Methods on Imbalance Datasets. Progress in pattern recognition, image analysis, computer vision, and applications lecture notes in computer science, (2013), pp. 262–269. https://doi.org/10.1007/978-3-642-41822-8_33. ISBN 978-3-642-41822-8 15. J. Hemmerich et al., Conformational Oversampling as Data Augmentation for Molecules. in Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions Lecture Notes in Computer Science, (2019), pp. 788–792. https://doi.org/10.1007/ 978-3-030-30493-5_74. ISBN 978-3-030-30493-5
An Approach to Securely Store Electronic Health Record(EHR) Using Blockchain with Proxy Re-Encryption and Behavioral Analysis Kajal Kiran Dash, Biswojit Nayak, and Bhabendu Kumar Mohanta
Abstract With the development of the Internet of Things (IoT) and information communication technology (ICT), the traditional healthcare system changes to realtime monitoring of patients using smart devices. These smart devices are capable of collecting vital information and communicating with other devices. The smart devices are more vulnerable to the attackers because of resource constraint devices and connected through wired or wireless. The security and privacy issues like data integrity, availability, and storage, etc., are very important to make the smart healthcare system secure. In this paper, we proposed architecture using blockchain technology to secure data sharing In the smart healthcare system, proposed proxy reencryption and behavioral analysis techniques make data communication and storage securely. The designed architecture makes vital information tamper-proof and preserves privacy for the data of the patient. Behavioral analysis is about the detailed security analysis which gives security for privacy and stops from manipulating the data. Keywords Smart hospital · Proxy re-encryption · EHR · Behavioral analysis · Blockchain.
1 Introduction Cyber-attacks reported by the smart healthcare providers has increased to 60% from year 2013 to 2014 where increased rate in other application domain is only 30% [1]. Similarly, in [2], it is predicted that Ransomware attacks on healthcare organizations K. Kiran Dash (B) · B. Nayak Department of CSA, Utkal University, Bhubaneswar, Odisha 751004, India e-mail: [email protected] B. Nayak e-mail: [email protected] B. Kumar Mohanta Department of CSE, IIIT, Bhubaneswar, Odisha 751003, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_41
415
416
K. Kiran Dash et al.
will be quadruple between year 2017 and 2020 and will grow to 5X by 2021. The security and privacy of the smart healthcare system is very important need to be addressed. The IoT-enabled smart healthcare architecture is proposed by the authors in [3] which makes the system automation. The cloud-based architecture is also proposed by some of the authors in [4, 5] to provide a secure platform for processing and computation. The fog computing provides processing and computation at the edge of the network. The authors in [6, 7] used fog computing for the healthcare system to monitor in real time. In the current EHR model which most hospital holds, there is a huge chance of security breach as the healthcare system has not been upgraded timely. The fog-based approach was designed and proposed by the authors in [8, 9] for encryption of smart healthcare records using cloud concepts. The authors in now different countries worldwide are working on the standard compliance to make patient data available to all hospitals as in case of emergency these should not be any time delay to know the patient’s historical record. But, here comes some drawbacks; while transmitting the data, the hackers would like to intercept and take the private details of the patient. So, we must have high-end security to secure the patient’s record. In our model, we will be discussing how we can secure personal data of hospital and patient from the illegal interrupt of hackers by using proxy re-encryption, blockchain, and behavioral analysis technology.
1.1 Motivation and Contribution of the Paper In EHR of smart hospital need to process and computing in real-time, so that appropriate action could be taken. Fog computing provides the computation at the edge of the network, and blockchain network being secure and transparent, here we have tried to develop a secure architecture. Using the proposed approach, data integrity of EHR is maintained. The rest of the paper is organized as follows: In Sect. 2, a literature survey of the previous work has been discussed. Sect. 3 explains about behavioral analysis in the healthcare system. Blockchain work in the healthcare system is discussed in Sect. 4. The proposed work of this paper is mentioned in Sect. 5. In Sect. 6, the conclusion and future work are given.
2 Literature Survey Several studies have shown integration of IoT and health care led to efficient processes and services. Kavita et-al [10] proposed a model in which the patient’s symptoms are monitored using sensors and transmitted to the cloud server via a gateway, which enables the physician to keep track of health condition irrespective of the patient location. Utkalika et al. [11] proposed a lightweight authentication scheme based on elliptic curve cryptography (ECC) for mobile phone in smart home environment. By
An Approach to Securely Store Electronic Health …
417
using this protocol, they could able to provide security against a MITM attack, replay attack, impersonate attack, and DoS attack. However, most of the authentication protocols have been proposed for a centralized IoT system. Bhabendu et al. [12] introduced an authentication system based on blockchain technology for a distributed system, in which the authentication of the IoT devices can be done in a decentralized manner. In addition to authentication, secure information sharing among devices is equally necessary; otherwise, it can lead to data leakage, data modification while transmission. Utkalika et al. [13] proposed a blockchain-based secure architecture for communication in IoT applications using permissioned hyperledger blockchain. By adopting blockchain technology in our application, we are eliminating the central architecture of the system that means we are eliminating single point failure and single node hacking. In the distributed system, all nodes are peers to each other; however, to maintain trust among the devices is a major task. Bhabendu et al. [14] proposed a solution to address the above-mentioned trust issue in a decentralized IoT by using blockchain. The IoT enables applications to have a lot of security issues in layer wise. The authors in [15] survey the details security and privacy issue of the IoT system, where data privacy is one of the important issues mentioned in that survey. So it is very much essential to address the security issue like data encryption to maintain data integrity.
3 Behavioral Analysis to Ensure Data Security in Healthcare System Human behavior plays a vital role to know what the person is going to on the next step. And, if we can know how the conduct of a person is ahead of time, we will be on the driver seat. According to the model, we will have analytics software where it will analyze ethical work that a hospital employee takes care of daily. Mostly, the employees will be given the authentication and authorization of transmitting the hospital data that includes vulnerable patient records, doctor records, etc., to other authorized organization or body. But suppose that employee wants to send these vulnerable records to unauthorized body or organization which brings the hospital and its patients privacy at stake, we can determine these things using the behavioral approach as that end-user is not authorized to receive the hospital records and so this will stop the user to send data and a notification will be sent to the administrator of the hospital. There are other solutions to these kinds of a security breaches. One of them is blockchain which involves several employees or stakeholders to permit to send data to the authorized receiver. If any one of them does not grant that permission, then data cannot be transmitted. Now comes one more scenario, suppose all the authorized member indulges in this crime, our analytics model should be smart enough to capture this. It should check the end-user profile and then check the authorized data sender’s profile and the behavior on the network server, and if it analyzes and gives a probability that the receiver profile is not the correct one, it
418
K. Kiran Dash et al.
stops transmitting of data and notification alert will be sent to the administrator team and required action will be taken by them. The attackers tries to penetrate into the system to gain the confidential information. The solution that can be brought here is to ask the user some questions whose answers should be confidential to the hospital employees. If the individual is unable to answer those which probability is at a much higher side, then the network should block the user by its macID.
4 Blockchain in Health Care The healthcare domain is now all set to move toward blockchain technology. Legacy blockchain technology was first introduced for the financial domain, or more precisely to solve Bitcoin (cryptocurrency) issues. Blockchain has the potential to address the security and privacy issue as it is immutable and has the consensus mechanism to verify and validate the transactions. Now, health care has got huge potential to inherit blockchain technology in its system due to its inherent characteristics, especially in the management of electronic health records. Healthcare data, it can be created, copied, and modified faster than ever before. And, if these data are the fuel behind more efficient care, blockchain can be considered as a vehicle to reach us there. These days the healthcare sector is losing a vast amount due to poor integrity of data. In its purest form, blockchain is offering health care a safe and secure system to share data without worrying about the security breach. Blockchain is an intricate system that is used to pack the data into packages that we can trust upon. The blockchain-based approach is shown in Fig. 1.
5 Proposed Model In the proposed model, shown in Fig. 2, we would like to keep at the most priority on the security of healthcare data. Keeping this in mind, our proposed model will act as follows: when a patient enters hospital premises, he/she will be registered to the EMR and from the EMR data will be sent to the permitted users commonly known as miners. The miners are the group of people who has permission to verify, create, and update the data into blocks and form a blockchain. When the data is sent to these miners, if anyone of those permitted group (miners) disagrees with a valid reason, then block of data can neither be created nor be updated. If all of them agree for the data flow, a block is created for the patient, and when they get more inputs of the patient, then again further blocks are created against the patient, hence forming the blockchain. The blockchain here is formed in fog server; then, the data will be sent to a permanent storage cloud server. If a patient or a doctor or a hospital group wants some historical data of the patient to treat the patient with more accuracy, they would request the hospital authorities through a request handler. Before the request reaches the request handler, it has to be passed through the behavioral analysis (BA)
An Approach to Securely Store Electronic Health …
Fig. 1 Blockchain-based EHR
Fig. 2 Proposed secured data sharing using blockchain
419
420
K. Kiran Dash et al.
tool. BA tool on certain algorithm decides to send the request by the end-user to request handler or not. For example, if an end-user wants some irrelevant data and the BA tool catches it, it gives rejects and holds the plea of the end-user and notifies to the admin. Now, the admin will decide if the request is relevant or not, based on that the request it will act upon. If the BA tool accepts the request, then it goes to request handler, and request handler, in turn, will send to miners. Miners will verify the data and then approve fetching the data. Once they permit that, the data will be sent to the end-user through encrypted data with a public secret encrypted key. This phenomenon is known as proxy re-encryption. The encrypted data with a key will be accessible to the end-user with a certain timestamp. If the end-user does not access the data within that period, he/she has to send the request again through the request handler for accessing the record. Here, we proposed primary storage as a fog server because of the time latency issue in the cloud server. For example, the traditional model of block chaining happens as follows: let t1 is the time taken by the end-user to raise a request, then the BA tool takes its own processing time, let us say t2 . After it processes, t3 is the time taken by the request handler to send the request to the cloud. The cloud will do its processing in t4 time and will reach to miners for verifying the requested record and then will process in t5 timestamp. Then, the proxy re-encryption process will generate the encrypted data with a public encrypted key and its process takes t6 time to reach the end-user. So, the total time taken for the requested data to reach end-user = t1 + t2 + t3 + t4 + t5 + t6 . But in our proposed model since we are in fog server we will be able to save the above-mentioned t4 time which time is the most among all timestamps. Every second counts, so the time that we saved will be helpful for the doctor to analyze the history of the patient and act upon quickly. And, many lives can be saved.
5.1 Proxy Re-Encryption The role of proxy re-encryption (PRE) server is just to re-encrypt the owner uploaded file into the ciphertext and outsource it to the user. This scheme is a public key encryption scheme with additional functionality that allows the holder of a key to derive a re-encryption key for any other key. Figure 3 shows the details of proxy re-encryption structure. Let us consider a scenario where P1 and P2 who have key pairs (pk(P1 ), sk(P2 )), respectively, which can generate a re-encryption key, rk(P1 , P2 ) that allows its owner say P3 to act as a proxy, i.e., she can transform ciphertexts under pk(P1 ) to ciphertexts under pk(P2 ) without having to know the underlying message. An insignificant way to achieve this would be for P1 to handle her secret key sk(P1 ) to P3 , who can then decrypt ciphertexts under pk, encrypt them under pk(P2 ) and send them to P2 . P1 ’s secret key acts as the re-encryption key and de-encryption algorithms are used for re-encryption. However, this approach requires P1 to reveal her secret key to P3 and therefore places complete trust in her. The engrossing cases are when the parties are mutually suspicious. Figure 4 data uploading is shown.
An Approach to Securely Store Electronic Health …
421
Fig. 3 Proxy re-encryption
Fig. 4 Proposed data uploading procedure
5.2 Security Analysis On the security aspect, as we know we are dealing with confidential data, security is our main priority. We should make assure of not giving any opportunity for a security breach to the unethical individuals or organization. For its call, we are using multiple authentication tools to ensure security. BA tool will analyze individual behavior if the user is ethical or not based on its algorithm. The miners will verify and give permission for fetching, creating, and updating the data, and the proxy re-encryption technology along with a certain time limit will ensure the end-user has a key to decrypt the encrypted data.
422
K. Kiran Dash et al.
6 Conclusion and Future Work Our fundamental objective in this paper is to accomplish the security for E-wellbeing records over the distributed storage. For safeguarding the protection of the patients information, we proposed the framework demonstrates the idea “behavioral analysis and enhancing the security service in the health system”. More over the data is primarily stored in fog server and then pushed to cloud server which decreases the latency, bandwidth, keeping the utmost priority for the patients’ life. The basic security properties of any system is to satisfy the confidentiality, integrity, and availability. Integration of IoT makes the healthcare system smart in terms of monitoring in real time the patient information. Similarly, computation can be done at the edge of the network using fog computing which will reduce the latency issue. For doing computation or sharing health record, encryption and decryption are very essential. In this paper, authors proposed re-encryption method and behavioral analysis using blockchain method to make the system secure. In future, we would like to implement the health record of hospitals. The computational analysis of the system needs to study to make the system more scalable.
References 1. P. Harries, The prognosis for healthcare payers and provider- s: rising cybersecurity risks and costs (2014). Available Online at: http://usblogs.pwc.com/cybersecurity/the-prognosisfor-healthcare-payers-and-providers-rising-cybersecurity-risks-and-costs/ 2. S. Morgan, Patient Insecurity: Explosion Of The Internet Of Medical Things (2019). Available Online at: https://cybersecurityventures.com/patient-insecurity-explosion-of-the-internet-ofmedical-things/ 3. L. Catarinucci, D. De Donno, L. Mainetti, L. Palano, L. Patrono, M.L. Stefanizzi, L. Tarricone, An iot-aware architecture for smart healthcare systems. IEEE IoT J. 2(6), 515–526 (2015) 4. C. Thota, R. Sundarasekar, G. Manogaran, R. Varatharajan, M. Priyan, Centralized fog computing security platform for IoT and cloud in healthcare system, in Fog Computing: Breakthroughs in Research and Practice (IGI Global, 2018), pp. 365–378 5. T. Muhammed, R. Mehmood, A. Albeshri, I. Katib, Ubehealth: a personalized ubiquitous cloud and edge-enabled networked healthcare system for smart cities. IEEE Access 6, 32258–32285 (2018) 6. P. Verma, S.K. Sood, Fog assisted-iot enabled patient health monitoring in smart homes. IEEE IoT J. 5(3), 1789–1796 (2018) 7. A.M. Rahmani, T.N. Gia, B. Negash, A. Anzanpour, I. Azimi, M. Jiang, P. Liljeberg, Exploiting smart e-health gateways at the edge of healthcare internet-of-things: a fog computing approach. Future Gener. Comput. Syst. 78, 641–658 (2018) 8. Y. Yang, M. Ma, Conjunctive keyword search with designated tester and timing enabled proxy re-encryption function for e-health clouds. IEEE Trans. Inf. Forensics Secur. 11(4), 746–759 (2015) 9. V. Vijayakumar, M. Priyan, G. Ushadevi, R. Varatharajan, G. Manogaran, P.V. Tarare, E-health cloud security using timing enabled proxy re-encryption. Mobile Network. Appl. 24(3), 1034– 1045 (2019) 10. K. Jaiswal, S. Sobhanayak, B.K. Mohanta, D. Jena, Iot-cloud based framework for patient’s data collection in smart healthcare system using raspberry-pi, in 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA) (IEEE, 2017), pp. 1–4
An Approach to Securely Store Electronic Health …
423
11. U. Satapathy, B.K. Mohanta, D. Jena, S. Sobhanayak, An ECC based lightweight authentication protocol for mobile phone in smart home, in 2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS) (IEEE, 2018), pp. 303–308 12. B.K. Mohanta, A. Sahoo, S. Patel, S.S. Panda, D. Jena, D. Gountia, Decauth: decentralized authentication scheme for iot device using ethereum Blockchain, in TENCON 2019–2019 IEEE Region 10 Conference (TENCON) (IEEE, 2019), pp. 558–563 13. U. Satapathy, B.K. Mohanta, S.S. Panda, S. Sobhanayak, D. Jena, A secure framework for communication in internet of things application using hyperledger based Blockchain, in 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (IEEE, 2019), pp. 1–7 14. B.K. Mohanta, S.S. Panda, U. Satapathy, D. Jena, D. Gountia, Trustworthy management in decentralized iot application using Blockchain, in 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (IEEE, 2019), pp. 1–5 15. B.K. Mohanta, D. Jena, U. Satapathy, S. Patnaik, Survey on iot security: challenges and solution using machine learning, artificial intelligence and Blockchain technology. Internet of Things 100227 (2020)
Automated Glaucoma Detection Using Cup to Disk Ratio and Grey Level Co-occurrence Matrix V. Priyanka and V. Uma Maheswari
Abstract Glaucoma is the second most familiar reason of blindness in the world. It is one of the chronic disease which is also known as “silent thief of sight” in which optic never is gradually damaged during the process, as it does not show any symptoms. It may cause permanent loss of vision, if not detected in initial stages. The progression leads to some structural changes in the eyeball, which helps the ophthalmologist to notice the glaucoma in early stages and stop its further progression. In this paper we are using an image processing technique to detect the glaucoma from retinal images. CDR values are used on different retinal images to detect glaucoma and its risk factor. Further, GLCM statistical features are used to extract texture features and lastly, SVM classifiers are fused to classify whether the retinal images also have diabetic retinopathy and diabetic maculopathy or not. Keywords Glaucoma · CDR · RDR · GLCM · SVM · Diabetic retinopathy · Diabetic maculopathy
1 Introduction Now a day, eye sight became one of the common problem for everyone. Almost every human being in his/her life gets eye problems, at least one time or another. Some are minor cases and will go away by its own, or are easy to treat with the help of home remedies. Others need a specialist’s care to treat. It is a long-lasting optical disorder which leads to eye loss if not treated in early stages. It is broadcasted as a second largest case of blindness in the world by the World Health Organization and it involves 15% of blindness cases [1] around the world which is approximately equal to 5.2 million population and by 2020 it may V. Priyanka (B) · V. Uma Maheswari Vardhaman College of Engineering, Hyderabad, Telangana, India e-mail: [email protected] V. Uma Maheswari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_42
425
426
V. Priyanka and V. Uma Maheswari
increase up to 80 million. The problem emerges from an increase in intraocular pressure, which attacks the optic nerves. A fluid inside the eye, called aqueous humor cause increase in IOP. Normally the balanced in the eye is sustained, as the quantity of fluid generated should be equivalent to the quantity of fluid emitted from the retina. Whereas, in this disorder there will be no fluid flow in the retina and increases the stress in the eye which results in destruction of cranial nerve which connects the eye and the brain. With time increase in pressure results in damage of optic nerve and can leads to irreversible blindness. It is also known as “Silent thief of Sight” because there will no premature symptoms and pain, when there a rise in Intraocular Pressure (IOP) during the early stages of glaucoma. One of the important sources to discover glaucoma is changes occur in structure of the internal eye. To detect the inner details of the eye to the abnormality, two modern biomedical imagining techniques are enabled for ophthalmologists, i.e., Optical Coherence Tomography (OCT) and Fundoscopy. Fundoscopy, also recognized as Ophthalmoscopy, internal retina is enlightened using a light beam reflected from the mirror mount in the device. A piercing in the mirror center helps the observer to view the enlightened region. Light rays reflected from the subject’s eye are converged at the observer’s retina forming a vision. Fundoscopy helps ophthalmologists to observe the Optic Disc. Optic disc looks like yellowish circular body, centered with an optic cup which is somewhat brighter area than optic disc. Figure 1 shows a normal eye retinal image. The circular rim region between optic cup and optic disc is known neuro-retinal rim (NRR). The ratio of the cup area to disc area called cup to disc ratio (CDR) is one of the visible structural change that occurs if glaucoma grows. CDR value ≤0.45 indicates normal eye. In glaucomatous eyes increase in cup size, results to increase in CDR [2] ratio and decrease in NRR [3] value. Thus, CDR and NRR play a key role in structural changes to notice glaucoma using Fundoscopy. Figure 1a represents the healthy retina with normal CDR value, i.e., ≤0.45, whereas Fig. 1b, c represents the glaucoma retinal image. Along with the change in CDR ratio, Fig. 1b, c depicts the alter in color intensity values of glaucoma eye versus non-glaucoma eye images. In glaucomated eye, due to the size of the cup increases the brighter area in optic disc also increases, thus overall image entropy, textural information, variance, mean, and color spatial also increases. Along with
Fig. 1 a Healthy eye b eye with moderate glaucoma c eye with high risk of glaucoma [4]
Automated Glaucoma Detection Using Cup to Disk Ratio and Grey …
427
Fig. 2 The typical stages of vision loss from glaucoma [6]
CDR and NRR, textural information [5] and image intensity can also help to detect the glaucoma (Fig. 2).
2 Related Work In 2018, the author implemented a novel method to separate the glaucomatous eye from normal eye on fundal images. The pre-processing method will eliminate the noise and contrast enhancement to provide image quality for further processes [7]. The Statistical feature uses Gray-Level Run Length Matrix (GLRLM) and GrayLevel Co-occurrence Matrix (GLCM) to obtain texture features. Later SVM classifier is used to distinguish whether the retinal is affected or not. In 2018 [8], the authors used bit-plane slicing and local binary patterns (LBP) methods to perform diagnosis of glaucoma. In order to increase the performance, they have used decision level-based fusion method. The article specifies that, the methodology provides high sensitivity and specificity values which can help to reduce the burden on ophthalmologist during mask screening. Harangi and Hajdu [4] has discussed to combine the probability models and group of individual algorithms to detect optic disc from retinal images. In the proposed algorithm, for every member algorithm more than one person was presented. Based on maximum weighted clique and spatial weighted graph methods, the optic disc nodes and position of a person are found with accuracy 98.46%. The work proposed in [9], developed a computer aided diagnosis (CAD) tool for perfect detection with the help of eighteen layers convolution neural network (CNN) which extract robust features from eye images. But this work needs a massive
428
V. Priyanka and V. Uma Maheswari
database to obtain the best possible performance. So, a massive database of 1426 eye images is used to get an accuracy of 98.13%, specificity of 98.3%, and a sensitivity of 98%. In 2011, the authors implemented an active contour method to determine the pathological process of glaucoma from fundus images to find CDR [10]. Although the article is discussed to find pathologies, but the method used in it failed for some fundus images due to the presence of other additional pathologies. The approach in this article can be improved by using some other pre-processing steps. Singh et al. [11] by segmenting optic disc from eye images wavelet features are been extracted. Later, the classification is done based on five classifiers, i.e., SVM, KNN, random forest, NB, and ANN classifiers. In the end, the performance of ANN classifier proves better feature selection and evolutionary attribute selection with an accuracy of 94%. Nayak et al. [12] has proposed to diagnosis the performance of glaucoma based on eye images. Here different features are being extracted like CDR, optic disc center distance, and area of blood vessels. ANN classifier is used to classify whether the image is a normal or glaucoma image with 86% accuracy.
3 Proposed Method The complete progress of the proposed system is estimated as, if the patient has affected with glaucoma or not, which are observed in retinal fundus images of both eyes of the patient. The procedure is done by calculating the CDR and RDR values from the retinal images and according to the values obtained the doctors determine if the patient eye has glaucoma or not. It will take 1–3 min of time to diagnosis the task for a very well trained specialist. There is also the reality that it could be extremely challenging when this is finished in screening campaigns where at the end, each specialist has hundreds of images to read. In this section, the techniques used to detect glaucoma are presented. The retinal image sample is shown in Fig. 4. The block diagram of the proposed methodology is represented in Fig. 3. The methodology begins with the pre-processing technique, in which the optic cup and optic disc images and their boundaries are identified. Later CDR and RDR values are calculated. Expert input values are provided which identifies whether the retinal image is normal eye or glaucomated eye along with its risk and checkup details with the ophthalmologist. Further, Grey Level Co-occurrence Matrix (GLCM) [13] statistical feature is applied to extract the texture features. Lastly, support vector machine (SVM) technique is used to distinguish whether the retinal eye was affected by diabetic retinopathy and diabetic maculopathy class or not.
Automated Glaucoma Detection Using Cup to Disk Ratio and Grey …
429
Fig. 3 Block diagram for proposed methodology
3.1 CDR and RDR The optic disc and optic cup regions [14] of the normal retinal image is represented in Fig. 1a. Previously we focused on the disc and the cup boundaries, later using Cartesians system we disclose the diameter, height center, and width center, horizontal center, and vertical center. With the help of calculated diameter, we need to find the cup-to-disk (CDR) ratio, which is nothing but the medical measurement to detect glaucoma. The deviation to measure CDR is,
430
V. Priyanka and V. Uma Maheswari
Dataset 1:
ORIGA Dataset:
Fig. 4 Input image
CDR = Cup diameter/Disc diameter Rim = (1−Cup diameter) − (1−Disk diameter) RDR = Rim diameter/Cup diameter If the ratio is greater than 0.45, then we need to consider that the patient is suffering with glaucoma. If the ratio is less than 0.45 then the patient is normal, he/she does not have glaucoma. But if CDR ratio is greater than 0.6 then there high risk of glaucoma to the patient and it may lead to permanent blindness. If cup size increases, then obviously the CDR ratio also increases, then automatically the RDR ratio decreases.
3.2 Gray-Level Co-occurrence Matrix GLCM is one more statistical technique to present the texture features, It provides the knowledge about the intensities of image pixels of their relevant positions therefore presents spatial-based structure information. The texture features are extracted from the retinal images which help in providing the relevent information about the images. GLCM for a Fig. 4 of size n × n is distinct as, n n 1, I (x, y) = i and I (x + x, y + y) = j P(i, j) = 0, otherwise x=1 y=1
Variables x and y are the area among the pixel and its neighboring pixel along x and y axis correspondingly. Intensity relationship of Neighbor pixels situated at 4 axis (0°, 45°, 90°, and 135°) and area—are gauged to measure the GLCM, where {[0, ] [−, ] [−, 0] [−, − ]} shows the displacement vector of the 4 axis.
Automated Glaucoma Detection Using Cup to Disk Ratio and Grey …
431
GLCM for all the 4 axis are calculated individually and ultimate matrix is computed by union of the 4 matrices.
4 Proposed Work This section, we reviewed about the proposed method on given retinal images to detect the Glaucoma among them. We have used two datasets; dataset1 has 10 retinal fundus images and dataset2, i.e., ORIGA dataset has 650 retinal fundus images to estimate the glaucoma within them. Dataset1 has 5 healthy patients and 5 glaucoma patients. Where as in dataset2 482 healthy patients and 168 glaucoma patients [15]. Algorithm: Support vector machine. Input: Retinal Fundus Images. Resolution: 256*256. Output: Glaucoma eyes or normal eyes. Step 1: Provide input image. Step 2: Images and boundaries of optic disk and optic cup are obtained. Step 3: CDR and RDR values are obtained. Step 4: Expert inputs are provided. Step 5: Classifies whether the input image has glaucoma or not. Step 6: Specifies checkup the details with the doctor and its risk. Step 7: GLCM and color features are extracted and lastly, SVM classifier is used. Step 8: Recognizes whether the input image has diabetic maculopathy and diabetic retinopathy or not.
5 Experimental Results In the results section, the step by step procedure to detect the glaucoma is shown with appropriate screenshots and with the dataset provided. MATLAB toolbox is used to implement and train the proposed system. SVM classifier is used to accelerate the training and prediction processes. The retinal image [16] in the databases are used to help us to identify the glaucoma. Initially, the CDR and RDR images and their boundaries are extracted from the retinal images. Later, CDR and RDR values are being detected from the retinal images. If CDR ratio is less than 0.45 then the person does not have glaucoma. But if the CDR values are greater than 0.45 and less than 0.6 then there is a possibility of glaucoma like medium risk and may be a risk of glaucoma. If value exceeds 0.6 then there is risk of glaucoma like high and very high risk of glaucoma. To detect the glaucoma, the expert should provide some expert values to ensure other medical problems to the patient. If the glaucoma is detected, then the glaucoma effected patient should go to checkup for every 2 months (Figs. 5 and 6).
432
V. Priyanka and V. Uma Maheswari
Fig. 5 Cup and disc areas and boundaries
Fig. 6 Glaucoma detection
Later, the texture features are being extracted using GLCM matrix. To calculate the performance of the proposed method the texture feature is used. Later, the support vector machine (SVM) classifier is performed to discriminate whether the eye is glaucomated or not. Figure 7 represents the proposed methodologies performance. The Existing system has achieved 75% of performance, whereas, the Proposed methodology has achieved an accuracy of 95% with sensitivity of 0.91% and specificity of 0.97%.
6 Conclusion In this research paper, we have developed an automated novel system to classify glaucomated retinal eyes with the help of some Structural and Nonstructural feature like SVM classifier. In order to get better results, Grey Level Co-occurrence Matrix (GLCM) and support vector machine methods are performed [13]. The proposed
Automated Glaucoma Detection Using Cup to Disk Ratio and Grey …
433
Fig. 7 Comparison between existing and proposed
technique achieved maximum accuracy of 95%, high performance. Also provides high specificity of 97% and a sensitivity of 91% values through the proposed method which help’s and reduce the burden on ophthalmologist during mass screening. In the future, we are planning to develop a methodology to diagnosis the glaucoma in early stages, age-related macular degeneration with the help of other machine learning classifiers or deep learning techniques.
References 1. 2. 3. 4. 5. 6. 7. 8. 9.
10. 11.
https://www.geteyesmart.org/eyesmart/disease/glaucoma/ https://en.wikipedia.org/wiki/Cup-to-disc_ratio J.B. Jonas, M.C. Fernández, J. Stürmer, Pattern of glaucomatous neuroretinal rim loss (1993) B. Harangi, A. Hajdu, Detection of the optic disc in fundus images by combining probability models (2015) V. Uma Maheswari, G.V. Prasad, S.V. Raju, A survey on local textural patterns for facial feature extraction (2018) Types of glaucoma, https://www.visionaware.org/info/your-eye-condition/glaucoma/the-differ ent-types-of-glaucoma/125 A. Dey, K.N. Dey, Automated glaucoma detection from fundus images of eye using statistical feature extraction methods and support vector machine classification (2018) S. Maheshwari, V. Kanhangad, R.B. Pachori, S.V. Bhandary, U.R. Acharya, Automated Glaucoma diagnosis using Bit-plane slicing and Local binary pattern techniques (2018) U. Raghavendra, H. Fujita, S.V. Bhandary, A. Gudigar, T.J. Hong, R. Acharya, Deep convolutional neural network for accurate diagnosis of glaucoma using digital fundus images (2018) M. Mishra, M.K. Nath, S. Dandapat, Glaucoma detection from color fundus images (2011) A. Singh, M.K. Dutta, M.P. Sarathi, V. Uher, R. Burget, Image processing based automatic diagnosis of glaucoma using wavelet features of segmented optic disc from fundus image (2015)
434
V. Priyanka and V. Uma Maheswari
12. J. Nayak, U.R. Acharya, P. Subbanna Bhat, N. Shetty, T.-C. Lim, Automated diagnosis of glaucoma using digital fundus images (2008) 13. P. Mohanaiah, P. Sathyanarayana, L. GuruKumar, Image texture feature extraction using GLCM approach (2013) 14. G.L.Spaeth, Appearances of the optic disc in glaucoma: a pathogenetic classification, in Symposium on Glaucoma. Transactions of the New Orleans Academy of Ophthalmology (1981) 15. Z. Zhang, F. Yin, J. Liu, W.K. Wong, N.M. Tan, B.H. Lee, J. Cheng, T.Y. Wong, ORIGA-light : an online retinal fundus image database for glaucoma analysis and research (2010) 16. J. Nayak, U.R. Acharya, P.S. Bhat, N. Shetty, T.C. Lim, Automated diagnosis of glaucoma using digital fundus images (2009)
Efficient Machine Learning Model for Intrusion Detection—A Comparative Study D. Raman, G. Vijendar Reddy, Ashwani Kumar, and Sathish Vuyyala
Abstract In a world which is increasingly dependent on computing and internet, new methods emerge every day to exploit the security in the form of both hardware and software. The spurt in the usage of smart devices, viz., Smart Phones, Smart TVs, and Smart Appliances which constitute Internet of Things (IoT), further calls for more vigilant network and devices to stop intrusion by unauthorized parties. Many intrusion detection techniques were already in place. On the other hand advances are being made in the machine learning front designing newer algorithms to make the machine more intelligent. This kind of artificial machine intelligence if utilized for detecting intrusions, the goal of devising an effective intrusion detection technique can be significantly achieved. This study intends to identify the best machine learning/deep learning model to incorporate in intrusion detection, by comparing various available models. NSL KDD data set is made use of to detect 21 different kinds of attacks. Three categories of models, namely, Linear models, Ensembles, and Deep Neural Networks are compared for their efficiency and observations are presented. This study draws that ensemble models are more efficient to use in Intrusion detection. Keywords Artificial intelligence · Machine learning · IDS · Cyber security · NSL KDD dataset D. Raman (B) · A. Kumar Department of Computer Science and Engineering, Vardhaman College of Engineering, Hyderabad, India e-mail: [email protected] A. Kumar e-mail: [email protected] G. V. Reddy Department of Information Technology, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, India e-mail: [email protected] S. Vuyyala Department of Computer Science and Engineering, MVSR, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_43
435
436
D. Raman et al.
1 Introduction As a result of improved digital literacy, Volumes of data is generated and stored in cloud. The cloud is a server which can be accessed online. This online access paved way for unauthorized intrusion. This requires a robust mechanism to prevent an attack on the network which created a sub-field of Cybersecurity called Intrusion Detection. Intrusion Detection System helps in mitigating various kinds of attacks on the network. IDS can be broadly classified into two categories: Location-based IDS and Detection-based IDS. Location-based IDS can be further categorized as Host based and Network based. While the host-based IDS protects the device from a specific host, network-based IDS protects the network from threats by constant monitoring and analysis. Most importantly on the technical front Detection-based IDS can be classified into Signature-based IDS and Anomaly based IDS. If the IDS detects an intrusion by identifying a pattern or similarity between the current code and previously known threats, it falls under Signature-based IDS. On the contrary, if the IDS identifies a threat based on the anomaly or change in the network behavior, it falls under Anomaly based IDS. In any IDS, the data packets transmitted over the network are to be classified as either benign or malicious. This requires a classification algorithm, where Machine Learning comes into picture. Various classification algorithms are available in Machine Learning. Employing the best ML classifier algorithm will provide us with Robust Intrusion Detection System. This study intends to identify the most efficient ML classifier algorithm to be used in IDS to design an efficient IDS.
2 Literature Review Mukkamala et al. [1] have designed models to reduce the data reduction and detection of intrusion with the help of neural networks [2–6] and Support Vector Machines. They described a neural network that utilizes important data aspects. But this model provides low accuracy in detecting the binary attack. Chen et al. [7] have made used of RST (Rough Set Theory) along with SVM for intrusion detection. SVM on RST is used to reduce the number of features. A fuzzy logic-based IDS [6, 8–11] is developed by Shanmugavadiva et al. [12]. KDD Cup 99 dataset is used in this model leading to more accuracy. Dhaliwal et al. [13] devised an effective IDS using XG-BOOST technique. Dhanabal et al. [14] made a study on NSL KDD data set for intrusion detection based on classification algorithm. Gaikwad et al. [15] discussed a bagging ensemble method. They made use of NSL KDD dataset and REPtree as their base classifiers. This method is 99.67% accurate. Parati et al. [16] devised a mixed technique of IDS using SVM and genetic algorithms as their base classifiers. This algorithm is more accurate. Yin et al. [17] have made
Efficient Machine Learning Model for Intrusion …
437
use of RNNs to develop an intrusion detection. This yielded a model which is more accurate compared to traditional ML classifiers in case of multiclass classification and binary classification. Almseidin et al. [18–22] have tested various classifiers like J48, Random Forest, Naive Bayes Network, Random Tree, and Decision Tree. Out of these, random forest classifier resulted in an accuracy of 93.77%.
3 Dataset In this study, the NSL-KDD Data set is used to compare the various intrusion detection systems. NSL-KDD data set addresses some of the inherent issues faced by KDD’99 Data set and also there are reasonable number of records in train and test sets. In addition, duplicate instances are excluded in NSL-KDD data set to prevent biased classification results. For any classifier, input is provided. In this case, the features of the data set are provided as input, and the output is obtained as a label of the given input data. NSL-KDD data set offers 41 features and 23 classes of attacks. Out of the 23 attacks, Spy and Warezclient are excluded in this study. The features in the NSL-KDD dataset are furnished in Table 1. In NSL-KDD data set, training data set consists of 21 different attacks. The attacks are broadly categorized into four groups as shown in Table 2. 1. DOS: DOS stands for Denial of Service attack, which intends to shut down the network, thus preventing the users from accessing it. This is done by flooding the network with huge traffic or by transmitting a trigger for crashing the network. 2. Probe: In this class of attacks, the attacker’s probes or searches for information on the network about the target device prior to attacking. 3. R2L: R2L stands for Remote to Local. As the name suggests, a remote user gains access to a victim device by exploiting a vulnerability to become local user. 4. U2R: U2R stands for user to root. Unlike R2L, a local user of the device gains root access by exploiting the system vulnerabilities.
4 Approach As discussed in the previous sections, various models are tested to ascertain the efficiency of the IDS. This section reviews the various available models and an overview of the metrics adopted in our approach. i.
Linear Models: Stochastic Gradient Descent(SGD) and Linear Regression(LR) are the two available linear models. Linear regression establishes a linear relationship among an dependent variable (target) and independent variable(predictor). In logistic regression, output can be mapped to a sigmoid function. It has only two values either 1 or 0 or specifically in this case “benign” or “malicious.” Whereas SGD is an optimizing technique with an iterative method.
438 Table 1 Features in a NSL-KDD data set
D. Raman et al. Type
Feature No
Feature
BASIC features
F1
Duration
F2
protocol_type
F3
Flag
F4
dst_bytes
F5
wrong_fragment
F6
Hot
F7
logged_in
F8
root_shell
F9
num_root
Traffic features
Content features
F10
num_shell
F11
num_outbound_cmds
F12
Count
F13
serror_rate
F14
rerror_rate
F15
same_srv_rate
F16
srv_diff_host_rate
F17
dst_host_srv_count
F18
dst_host_diff_srv_rate
F19
dst_host_count
F20
diff_srv_rate
F21
srv_rerror_rate
F22
is_guest_logins
F23
service
F24
src_bytes
F25
Land
F26
Urgent
F27
num_failed_logins
F28
num_compromised
F29
su_attempted
F30
num_file_creations
F31
num_access_files
F32
is_host_login
F33
srv_count
F34
srv_serror_rate
F35
dst_host_srv_rerror_rate
F36
dst_host_srv_serror_rate
F37
dst_host_srv_diff_host_rate (continued)
Efficient Machine Learning Model for Intrusion … Table 1 (continued)
Type
439 Feature No
Feature
F38
dst_host_same_srv_rate
F39
dst_host_same_srv_port_rate
F40
dst_host_serror_rate
F41
dst_host_rerror_rate
Table 2 Different classes of attacks with associated attack types Class of attack
Attack type
DOS
Back,Land,Neptune,Pod,Smurf, Teardrop,Mailbomb,Processtable,Udpstor m,Apache2,Worm
Probe
Satan,IPsweep,Nmap,Portsweep,Mscan,Sa int
R2L
Guess_password,Ftp_write,Imap,Phf,Multi hop,Warezmaster,Xlock,Xsnoop,Snmpgue ss,Snmpgetattack,Httptunnel,Sendmail, Named
U2R
Buffer_overflow,Loadmodule,Rootkit,Perl ,Sqlattack,Xterm,Ps
Here few samples of the data set are considered for each iteration instead of entire data set. ii. Ensemble Models: This is a combination of different learning models to give improved outcome. These models can be categorized as Homogenous Ensemble which uses similar type of base learners in contrast to Heterogenous Ensemble that use different kinds of base learners. Few examples of ensemble models are discussed below a. Boosting: Here classifiers are arranged serially such that the error produced by the earlier stage of classification is reduced explicitly in the next stage. There exists a setup to adjust the weights in each stage based on the knowledge acquired in the previous stage to reduce the error. This is done by identifying the split points using an algorithm. The LightGBM algorithm makes use of GOSS(Gradient-Based One Sided Smapling) to separate the samples and thereby identify the split points. High speeds and efficient handling of large data are few features which makes LGBM preferable. On the other hand XG-BOOST is another algorithm which uses an ensemble of decision trees. This ensemble contains regression and classification trees. The individual tree result is aggregated as final result and Taylor expansion is used to compute various base learners’ loss function. b. Stacking: In this ensemble technique, the predictions of all base learning algorithms are combined through a combiner algorithm and the final prediction is obtained. iii. Deep Neural Networks: This is another model studied in this research. In this model animal intelligence is mimicked using a neural network which consists of
440
D. Raman et al.
Table 3 Confusion matrix
Predicted as malicious
Predicted as benign
Actual malicious data
True positive
False negative
Actual benign data
False positive
True negative
three types of neurons. Input, Hidden, and output. The input neurons receive the input data and transmits the data to first hidden layer of neurons. The hidden layer neurons perform mathematical computations on the input data. The number of hidden layers depends on the complexity of the problem. All the predictions of the outcomes are made in output layer. iv. Confusion Matrix: This is the metric used to analyse the performance of any classifier algorithm. It is a table in which counts of correct and incorrect predictions are summarized. This gives insight into the errors made by the algorithm in predicting the desired outcome (Table 3). Thus the accuracy of the classifier is measured as Accuracya = →a=
Instancesclassifiedcorrectly TotalInstances
(1)
TP + TN TP + TN + FP + FN
(2)
5 Execution The models discussed above are implemented on Kaggle, which acts as a cloud platform for conducting data science projects. It allows to create Kernels which can later be ran on NVIDIA K80 GPUs. A training model is created using 126,973 samples. Later testing is done on all models using 22,544 test samples. A confusion matrix is obtained for every classifier model. The following is the flowchart to implement this study (Fig. 1). The following are the steps in the above flowchart. 1. 2. 3. 4. 5.
The NSL Data set containing test and training data is fed to the program. Required libraries and packages of the different models are fetched The program is run over the algorithms by adjusting the parameters. A confusion matrix is generated Accuracy is calculated.
If the input data is of size p*q, which is used to train all the five classifier algorithms, the predictions from these algorithms are united to form a matrix of size p*Q, where Q corresponds to total models used. By feeding these predictions to the
Efficient Machine Learning Model for Intrusion …
441
Start
Test
NSL-KDD Data set
Classifiers
Python
Train
Library
Confusion Matrix
Benign
Malicious
Plot Result
END
Fig. 1 Flow chart
second level model, final prediction is obtained. The stacked classifier architecture is achieved as represented below (Fig. 2).
Training data (p*q)
Linear Regression
SGD
LGBM
XG BOOST
Input training data set for second level model with predictions from first level model (p*Q)
Second level Model
Final Prediction
Fig. 2 Architecture of the ensemble model
DNN
442 Table 4 Hyperparameter values for LGBM
Table 5 Hyperparameter values for XG-BOOST
D. Raman et al. Hyperparameter
Optimal value
learning_rate
0.2
num_leaves
30
bagging_fraction
0.8
feature_fraction
0.6
Nthread
4
number_boost_rounds
300
early_stopping_rounds
300
Hyperparameter
Optimal value
eta (learning rate)
0.2
max_depth
6
Nthread
4
min_child_weight
4
num_boost_rounds
300
early_stopping_rounds
100
Subsample
0.7
Sklearn library is used to implement LR and SGD. Default(LIBLINEAR)solver is used the regulation strength is tuned to 0.2. 1000 passes are made about the training data and optimal training rate is utilized. Respective packages in python are used for implementing XG-BOOST and LGBM algorithms. The following optimal hyperparameter values are used to get effective results (Tables 4 and 5).
6 Results The efficiencies of the six models were calculated and furnished hereunder (Table 6). Table 6 Accuracies of various models
Classifier
Accuracy
Stacked classifier
0.98349
XG-Boost
0.98347
Light GBM
0.98262
DNN
0.97559
Logistic regression
0.97308
SGD
0.94077
Efficient Machine Learning Model for Intrusion …
443
Fig. 3 Accuracy by classifier chart
It can be deduced that, of all the six models, stacked model, or ensemble model provides high accuracy of 98.35%. XG-BOOST and Light BGM models stay close to the ensemble models accuracy. SGD gives the least accuracy of 94%. The graphical representation of the results is furnished below (Fig. 3). Thus, it can be concluded that the performance of GBDT over Deep Neural Network models and linear models may be attributed to the distributed, parallel processing and their ability to handle large data. Stacked model performs better than other models as it utilizes the memory and hardware efficiently.
7 Conclusion In this study, various available classifier models are compared for efficiencies on NSL-KDD set for intrusion detection. An ensemble model is created over the top of five classifier models and it is proved that it resulted in better accuracy and efficiency over individual models. Further, more complex models of various algorithms can be used to create ensembles and tested in future for creating an efficient intrusion detection system. In addition, the current model can also be tested on other datasets for comparison of accuracies giving scope to future work.
References 1. S. Mukkamala, G. Janoski, A. Sung, Intrusion detection using neural networks and support vector machines. 2, 1702–1707 (2002) 2. A. Kumar, Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. Int. J. Cloud Appl. Comput. (IJCAC) 9(3), 22–36 (2019)
444
D. Raman et al.
3. A. Kumar, S. Srivastava, Object detection system based on convolution neural networks using single shot multi-box detector. Procedia Comput. Sci. 171, 2610–2617 (2020) 4. A. Kumar, S.S.S.S. Reddy, V. Kulkarni, An object detection technique for blind people in realtime using deep neural network, in 2019 5th International Conference on Image Information Processing (ICIIP) (Shimla, India, 2019), pp. 292–297. doi: https://doi.org/10.1109/ICIIP4 7207.2019.8985965 5. A. Kumar, A review on implementation of digital image watermarking techniques using LSB and DWT, in The 3rd International Conference on Information and Communication Technology for Sustainable Development (ICT4SD 2018). 30–31 Aug 2018 at Hotel Vivanta by Taj, Goa, India 6. D. Raman, M.S.S. Reddy, Y.S. Reddy, Risk assessment for identifying intrusion detection using ds-evidence theory in manet. Int. J. Comput. Electr. Adv. Commun. Eng. 1(3) (2012). @ ISSN: 2250–3129 7. R. Chen, K. Cheng, Y. Chen, C. Hsieh, Using rough set and support vector machine for network intrusion detection system, pp. 465–470 (2009) 8. D. Raman, A. Rahul, The security implications for web application design, in 2nd International Conference of ICACM-2013 (Elsevier, 2013). ISBN No: 9789351071495 9. A. Monika, D. Raman, Justified cross-site scripting attacks prevention from client-side. Int. J. Adv. Technol. Eng. Res. 6(7) (2014). @ ISSN: 0975–3397 10. D. Raman, T.V. Rajinikanth, B. Bruhadeshwar, A. Monika, Script less attacks: shaded defensive architecture. Int. J. Adv. Technol. Eng. Res. 4(5) (2014). @ ISSN No: 2250–3536 11. R. Dugyala, B. Bezawada, R. Agrawal, S. Sathyanarayan, R. Tatiparthi, Application of information flow tracking for signature generation and detection of malware families. Int. J. Appl. Eng. Res. (IJAER) 9(24), 29371–90. @ P- ISSN 0973–4562 e- ISSN 1087–1090 12. R. Shanmugavadivu, N. Nagarajan, Network intrusion detection system using fuzzy logic. Indian J. Comput. Sci. Eng. 2, 02 (2011) 13. S.S. Dongre, K.K. Wankhade, Intrusion detection system using new ensemble boosting approach. Int. J. Model. Optim. 2(4) (2012) 14. L. Dhanabal, S.P. Shantharajah, A study on nslkdd dataset for intrusion detection system based on classification algorithms (2015) 15. N. Parati, S. Potteti, Intelligent intrusion detection system using svm and genetic algorithm (svm-ga). Int. J. Sci. Appl. Inf. Technol. (IJSAIT) 4, 01–05 (2015) 16. M.N. Chowdhury, K. Ferens, M.V. Ferens, Network intrusion detection using machine learning (2016) 17. C. Yin, Y. Zhu, J. Fei, X. He, A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access, 5, 21 954–21 961 (2017) 18. M. Almseidin, M. Alzubi, S. Kovacs, M. Alkasassbeh, Evaluation of machine learning algorithms for intrusion detection system, pp. 000 277–000 282 (2017) 19. D.P. Gaikwad, R.C. Thool, Intrusion detection system using bagging ensemble method of machine learning, pp. 291–295 (2015) 20. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.-Y. Liu, Lightgbm: a highly efficient gradient boosting decision tree, pp. 3149–3157 (2017) 21. A.H. Mirza, Computer network intrusion detection using various classifiers and ensemble learning. pp. 1–4 (2018) 22. S.S. Dhaliwal, A.-A. Nahid, R. Abbas, Effective intrusion detection system using xgboost. Information 9(7) 2018. (Online). Available: https: //www.mdpi.com/2078-2489/9/7/149
Anomaly Detection in HTTP Requests Using Machine Learning Ayush Gupta and Avani Modak
Abstract Ever since the internet was introduced to the world, its popularity has been increasing continuously. Needless to say, as the volume of confidential data in the web traffic is significant, cyber-attacks, too, are on the rise. The internet witnesses a wide range of intruders; from script kiddies to professional hackers; using sophisticated exploits to gain access to qualified data. In such a scenario, intrusion detection, mitigation, and prevention become crucial to prevent data from being compromised. This paper deals with the analysis of HTTP packets intending to classify them as normal and anomalous. For this, various machine learning algorithms are trained and suited by feeding relevant data attributes. CSIC 2010 HTTP dataset is used and the performance of various algorithms including Logistic Regression, Kernel SVC, Multilayer Perceptron, Bagging classifier, etc., are compared and results are tabulated. Keywords Web intrusion · Intrusion detection system · HTTP packets · HTTP CSIC 2010 dataset · Anomaly detection
1 Introduction Communications over the internet adhere to the HyperText Transfer Protocol (HTTP). A website is hosted by a web server and when a web-client wants to access any resource, the client makes a HTTP request to the server and gets a HTTP response in return. A standard HTTP request packet has fields like Request Line (which contains the HTTP method and URI), Headers, Cookies, and an optional Request Body. On receiving this request, the server validates the identity of the
A. Gupta · A. Modak (B) Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] A. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_44
445
446
A. Gupta and A. Modak
client and then sends back appropriate information through a HTTP response. A lot of information about the client can be disclosed if these packets were to be intercepted. Attackers generally tweak HTTP packets to exploit the web server; executing a successful web intrusion attack. Some of the techniques involved in the most rampant attacks are:
1.1 Injection Any web functionality that accepts input from the client is vulnerable to this attack. Here, the attacker enters invalid data in the input field, which, if executed, makes the server do something which it was not expected or designed to do [1]. Some common injection attacks are discussed below.
1.1.1
SQL Injection
In this type of injection, the attacker inputs SQL queries with the motive of gaining access to user accounts or tampering with the database [2]. For example, a login page generally has two input fields, one for username and one for the password. If the attacker were to input a valid username and ’ OR ’1’=’1 in the password field, it would execute the backend command SELECT * FROM USERS WHERE USER = ‘username’ and PASSWORD = " OR ’1’=’1’; completing a successful SQL query and revealing the information of that particular user.
1.1.2
Command Injection
In this scenario, the attacker inputs OS commands with the motive of getting them executed on the web server [3]. For example: entering rm -rf in the input field can result in deleting the directory from where the server page is being rendered.
1.2 Cross-Site Scripting This attack occurs when a malicious client-side script is injected into the website, which ends up changing how the website is rendered to other end users [1].
Anomaly Detection in HTTP Requests Using …
447
1.3 Invalid Requests Sometimes, attackers use brute force attacks to gain access to restricted directories by exploiting security configurations [1]. For example, every server has a particular folder location from where its website is rendered and poor security configurations may allow an attacker to execute directory traversal attacks where the attacker is able to access data outside this directory. Also, attackers may request nonexistent resources on a website with the intent of information disclosure [1] through error messages which can help the attacker plan a more informed attack. The purpose of mentioning these particular attacks is that the CSIC 2010 HTTP dataset, which has been used in this paper, has hundreds of packets that exhibit these attacks. Hence, if we identify which HTTP attributes are more prone to having malicious values, we can specialize our solution to handle them. An Intrusion Detection System (IDS) [4] is generally deployed at different points of a network or on a host’s system so that it processes the incoming and outgoing traffic to ensure the security of the concerned entity. Two popular types of IDS are signature-based IDS [5] and anomaly-based IDS [6]. The signature-based systems search for signatures or particular patterns in traffic data; while the anomaly-based systems function by comparing trustworthy data with the incoming data to find deviations. With new attacks and techniques being devised every day, it seems logical to design an anomaly-based intrusion detection system, which is the intent of this paper. This paper is organized as follows: Sect. 2 briefs about the literature review regarding various methods used for Anomaly Detection in HTTP requests, Sect. 3 explains the list of proposed features and the machine learning algorithms used, Sect. 4 presents the Implementation, Sect. 5 presents the results of those algorithms, and Sect. 6 states the conclusion and important findings achieved.
2 Literature Review Many researchers have worked on detecting anomalies in HTTP packets using different machine learning techniques and tools. Nguyen et al. [7] focused on feature selection to increase the accuracy of detection by using Generic Feature Selection (GeFS). Four classifiers (C4.5, CART, Random Tree, Random Forest) with tenfold cross-validation were used. The main motive of this paper was to select the optimum number of features without affecting accuracy before and after feature selection. Pham et al. [8] conducted the comparative analysis of various machine learning algorithms on CSIC 2010 dataset; namely, Random Forest, LR, Decision tree, AdaBoost, SGD Classifier. LR gave both, the highest precision, and the highest recall. Sara Althubiti et al. [9] performed experiments on CSIC 2010 dataset for anomaly detection using WEKA. Five most relevant HTTP attributes were selected and fed to
448
A. Gupta and A. Modak
machine learning algorithms like Random Forest, LR, AdaBoost, J48, and Naïve Bayes. Experiments showed that all techniques except Naïve Bayes had high precision, recall, and F 1 -measures and low FPR. Chao Liu et al. [11] analyzed the characteristics of common web attacks and used them for feature extraction followed by the application of the nonlinear SVM algorithm. Rajagopal Smitha et al. [12] compared the performance of different ML algorithms like Decision forest, Neural networks, SVM, and LR on HTTP CSIC 2010 dataset. Microsoft Azure Machine Learning Studio (MAMLS) was used. Results showed that the highest precision was reported by SVM, whereas the highest recall was given by LR. Rafal Kozik et al. [10] proposed a method (involving HashMap generation using the URL and HTTP method) for modeling normal behavior of web applications using whitelists based on HTTP request headers of CSIC 2010 dataset. Here classifiers like J48, PART, AdaBoost, and Naïve Bayes were evaluated; combined with tenfold cross-validation showing J48 to give the best results. Two of the papers mentioned [9, 12] have used a machine learning tool to process that dataset and get accurate results; almost all (except [8, 10]) have used feature selection combined with the use of a variety of machine learning algorithms. This paper focusses on the use and modification of the machine learning algorithms offered by python libraries without taking the help of external machine learning tools. Based on the literature survey conducted, the algorithms LR, SGD, MLP, SVM, LDA, Bagging are selected to encompass a variety of machine learning techniques such as regression, neural networks, ensemble learning, etc. The goal of attaining maximum accuracy is achieved by selecting the most relevant features combined with keyword extraction.
3 Proposed Features and Machine Learning Algorithms Machine Learning algorithms generalize a problem by looking at the data and creating a mathematical equation. Generally, the dataset is divided into training and testing sets in the ratio of 75:25. The model is trained on the Training Dataset and it then predicts the results for the Testing Dataset. The accuracy of a model is determined by the results of the prediction on the Testing Dataset.
3.1 HTTP CSIC 2010 Dataset For implementation, the HTTP Dataset CSIC 2010 [13] has been used which was developed by the Information Security Institute of CSIC (Spanish Research National Council). It is a publicly available dataset that is generally used for testing intrusion detection implementations. It contains the HTTP traffic which was aimed for a web application and comprises about 36,000 normal packets and about 25,000 anomalous packets. A range of attacks such as SQL injection, OS injection, XSS attacks, etc., are
Anomaly Detection in HTTP Requests Using …
449
evident in this dataset. For easier implementation, the text dataset has been converted to CSV format.
3.2 Data Pre-processing In this step, the feature selection, feature encoding, and normalization process are discussed. For feature selection, the data attributes which are relevant to the problem are cherry-picked. Features like Protocol, User-Agent, Pragma, CacheControl, Accept, Host, and Connection did not contribute much to the classification [13]. Recursive Feature Elimination (RFE) is used to rank these features. RFE recursively eliminates features and builds a model using the remaining attributes and calculates the model accuracy. The list of features which are ranked by RFE algorithm is given below: (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n)
HTTP Method Length of URL Length of Argument Number of digits present in the Argument Number of characters present in the Argument Number of special characters present in the Argument Number of digits present in the URL Number of Characters present in the URL Number of Special Characters present in the URL Number of digits present in Cookies Number of Characters present in Cookies Length of Content-Type Length of Content-Length Number of Keywords present in the argument.
HTTP Method is a categorical feature and is encoded using One Hot Encoder. All the above features are then normalized on the scale between 0 and 1 and this processed data is fed to a machine learning model.
3.3 Machine Learning Algorithms The Machine Learning Algorithms used in the paper are discussed below:
3.3.1
Logistic Regression
Here, the target variable can take only 2 values (Normal or Anomaly), a binary logistic Regression Model is used. Logistic Regression is a useful algorithm in the
450
A. Gupta and A. Modak
case of categorical output fields. Logistic Regression uses a mathematical equation just like Linear Regression (Eq. 1) which is [14]: y = b0 + b1 x1 + b2 x2 + b3 x3 + · · ·
(1)
This result is then fed to the sigmoid function (Eq. 2) which gives an output between 0 and 1. Based on the predicted probability and actual data, the error value is calculated and then the coefficients are adjusted accordingly. P(x) =
3.3.2
1 1 − e−x
(2)
Stochastic Gradient Descent (SGD)
In Gradient Descent Optimization Algorithm the weights are updated incrementally after each iteration of training set. SGD is an iterative optimization technique that iterates over each training data point and updates the weights during the process [15]. SGD reduces redundancy as compared to Batch Gradient Descent which computes weights for the whole training data at a time. w j+1 = w j + η
target(i) − output(i) x (i) j
(3)
i
where, wj is the jth weight coefficient, x j is the jth training sample, ï is the learning rate, target(i) is the ith output sample and output(i) is the ith predicted output.
3.3.3
Multilayer Perceptron
A Multilayer Perceptron (MLP) is a deep, Artificial Neural Network. It consists of hidden layers which act as the computation engine. The model trains on a set of input–output pairs and learns to model the correlation between the two by adjusting the parameters, or the weights and biases, of the model to minimize the error. Backpropagation is used to make these adjustments relative to the error. As the number of hidden layers increase, the computation power and time also increases. So, the goal is to find a middle ground between the algorithm accuracy and computation resources and time [16]. In this paper, 2 hidden layers and Rectified Linear Unit Function (ReLu) as an activation function are proposed to compute the output category.
Anomaly Detection in HTTP Requests Using …
3.3.4
451
Support Vector Machine
Support Vector Machine (SVM) maps m-dimensional input vectors to a higher kdimensional feature space based on various Kernels () used. In this k-dimensional space, a hyperplane is then selected which separates the different output classes [17]. The data points are considered in the form (x i , yi ), where, yi = ith class label and x i = ith input vector (m dimensional). The equation for the optimal hyperplane can be given in Eq. (4): w.x + b = 0
(4)
where, b is a constant and w is a m-dimensional weight matrix. In this paper, Linear Kernel has been used to compute the outcome based on the selected feature set. As the name suggests it maps the input feature space linearly. It is given as in Eq. (5) [18]: K xi , x j = xiT x j
3.3.5
(5)
Linear Discriminant Analysis (LDA)
In LDA, the original feature space is mapped to a lower-dimensional space using the following 3 steps [19]: a. Calculate Within Class Variance using the difference between the mean and each sample of the dataset. The Within Class Variance is given in Eq. (8). μ3 = p1 μ1 + p2 μ2
(6)
T cov j = x j − μ j x j − μ j
(7)
Sw =
p j cov j
(8)
j
where, μ is the mean, p is the ratio in which dataset is divided, x j is the jth dataset sample, covj is the covariance for jth class and S w is the Within Class Variance measure. b. Calculate Between Class Variance using the difference between means of different classes. S b is the Between Class Variance and it is given in Eq. (9). Sb =
T μ j − μ3 μ j − μ3 j
(9)
452
A. Gupta and A. Modak
Table 1 Training set formation using data parts
Training set
Dataset combinations
Training set 1
1,2,3,6,7,3,2,8
Training set 2
2,4,2,4,8,5,6,7
Training set 3
4,5,6,8,7,3,4,3
c. Construct a lower-dimensional space that maximizes the Between Class Variance and minimizes the Within Class Variance. 3.3.6
Bagging Classifier
Bagging is a “bootstrap” ensemble method that divides the training set into parts and trains the model in combination with the parts. As shown in Table 1, the training sets are formed using different parts of the dataset. For example, an imaginary dataset is divided into 8 different parts. In the Training Set 1, data parts 2 and 3 are repeated, and dataset 5 and 4 are not used. As a result, the classifier trained on the below training sets might perform better as compared to the original dataset. This approach works better with unstable learning algorithms like Neural Networks and Decision Tree [20]. In this paper, this Bagging Classifier has been used with Random Decision Tree classifier.
4 Implementation In this paper, the implementation of the above-mentioned algorithms is done in Python. Python has open-source libraries like Pandas and NumPy which are used to pre-process the input dataset, and Scikit-learn used for machine learning APIs. The input dataset is divided into training and test sets in the ratio of 75:25. All the proposed features are then extracted, to avoid the over-fitting problem a tenfold cross-validation technique is used to train the model. The listed algorithms are then compared on the basis of metrices mentioned below:
4.1 Accuracy Accuracy (A) is defined as the ratio of correctly predicted output and total samples in the test set. A = Correctly predicted output/Total samples
Anomaly Detection in HTTP Requests Using …
453
4.2 Precision Precision (P) is the ratio of true positives (t p ) and sum of true positives (t p ) and false positives (f p ). tp P= tp + f p
4.3 Recall Recall (R) is defined as the ratio of true positives (t p ) over the sum of true positives and false negatives (f n ). tp R= tp + f n
4.4 F1-Score It is defined as the Harmonic Mean of Precision (P) and Recall (R). It is calculated as: F1 =
2(P ∗ R) (P + R)
5 Result As mentioned, the dataset used (HTTP dataset CSIC 2010) exhibits popular and most prevalent attacks like SQL injection, cross-site scripting, etc. Training and testing the selected algorithms for this dataset showed that Bagging classifier performed the best with 99% accuracy, 99% precision, 98% recall; the reason being that Bagging classifier works by generating n classification trees using bootstrap sampling of the training data and then their predictions are fused to produce a final meta-prediction. Hence, bagging is a better suited classifier for this purpose as compared to the remaining algorithms. The comparison of listed machine learning algorithms to solve the stated problem is given in Table 2.
454
A. Gupta and A. Modak
Table 2 Comparison of various machine learning algorithms Algorithm
A
P
R
F 1 -score
LR
0.95
0.89
0.97
0.93
SGD
0.95
0.90
0.96
0.93
MLP
0.98
0.96
0.99
0.97
SVM
0.95
0.89
0.97
0.93
LDA
0.92
0.84
0.95
0.86
Bagging
0.99
0.99
0.98
0.98
6 Conclusion In this paper, various machine learning algorithms were applied on CSIC 2010 dataset to make the anomaly detection system more robust by using feature selection to circumvent redundant data. To increase the accuracy of the system, additional features training the system to recognize characteristic keywords of various common attacks were added. As a result of this, accuracy in the range of 90–99% was successfully attained as evident from Table 2, with the Bagging Classifier showing maximum accuracy of 99.31%.
References 1. OWASP : OWASP Top 10 : https://owasp.org/www-project-top-ten/ 2. P.A. Carter, SQL injection, in Securing SQL Server (2018), pp. 221–245. doi:https://doi.org/ 10.1007/978-1-4842-4161-5_10 OWASP. (Command Injection) 3. https://owasp.org/www-community/attacks/Command_Injection 4. What is an Intrusion Detection System? Latest Types and Tools. https://www.dnsstuff.com/int rusion-detection-system#what-is-a-network-intrusion-detection-system 5. H. Wu, S. Schwab, R.L. Peckham, US Patent 7,424,744, Signature based network intrusion detection system and method, 2008 6. P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, E. Vázquez, Anomaly-based network intrusion detection: techniques, systems and challenges. Comput. Secur. 28(1–2), 18–28 (2009). (Intrusion Detection System) 7. H.T. Nguyen et al., Application of the Generic Feature Selection Measure in Detection of Web Attacks, in Computational Intelligence in Security for Information Systems (Springer, Berlin, 2011), pp. 25–32 8. T.S. Pham, T.H. Hoang, V.C. Vu, Machine learning techniques for web intrusion detection—a comparison, in 8th International Conference on Knowledge and Systems Engineering (KSE) (IEEE, 2016) 9. S. Althubiti, X. Yuan, A. Esterline, Analyzing HTTP requests for web intrusion detection, in KSU Proceedings on Cybersecurity Education, Research and Practice 2 (2017) 10. R. Kozik, M. Chora, R. Renk, W. Holubowicz, A proposal of algorithm for web applications cyber attack detection, in Computer Information Systems and Industrial Management, Lecture Notes in Computer Science (2014), pp. 680–687 11. C. Liu, J. Yang, J. Wu, Web intrusion detection system combined with feature analysis and SVM optimization. EURASIP J. Wirel. Commun. Netw. 2020, 33 (2020)
Anomaly Detection in HTTP Requests Using …
455
12. R. Smitha, K.S. Hareesha, P.P. Kundapur, A machine learning approach for web intrusion detection: MAMLS perspective. Immunol. Tolerance, 119–133 (2019) 13. C.T. Gimenez, A.P. Villegas, G.A. Maranon, HTTP data set CSIC 2010 (2010) 14. H. Park, An introduction to logistic regression: from basic concepts to interpretation with particular attention to nursing domain. J. Korean Acad. Nurs. 43(2), 154–164 15. L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of COMPSTAT’2010, pp. 177–186. doi:https://doi.org/10.1007/978-3-7908-2604-3_16 16. E. Wilson, D.W. Tufts, Multilayer perceptron design algorithm, in Proceedings of IEEE Workshop on Neural Networks for Signal Processing. doi:https://doi.org/10.1109/nnsp.1994. 366063 17. V.N. Vapnik, The nature of statistical learning theory. doi:https://doi.org/10.1007/978-1-47572440-0 18. M.A. Hearst, S.T. Dumais, E. Osuna, J. Platt, B. Scholkopf, ,Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28. doi:https://doi.org/10.1109/5254.708428 19. A. Tharwat, T. Gaber, A. Ibrahim, A.E. Hassanien, Linear discriminant analysis: a detailed tutorial. AI Commun. 30(2), 169–190. doi:https://doi.org/10.3233/aic-170729 20. L. Breiman, Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Best Fit Radial Kernel Support Vector Machine for Intelligent Crop Yield Prediction Method Vijay Hanuman, Krishna Vamsi Pinnamaneni, and Tripty Singh
Abstract Farming is the foundation of a country’s economy and it’s essential to meet the growing demand of the market. Therefore, it’s important to develop technologies that enhance the agricultural yield. The yield of a crop depends on various factors like climate, soil attributes, and so on, which forms a huge volume of data. Machine learning is one of the efficient technologies that helps in the identification of patterns and rules in large datasets. This paper proposed a method named Intelligent Crop Yield Prediction Method (ICYPM) to solve crop selection problem, and maximize net yield rate of crop. The authors have implemented and analyzed ICYPM with regression algorithms like SVM and RF. RMSE with 4 other parameters were used to compare the algorithms. In this study SVM fared the best as it efficiently predicted the climate and yield. This research has been carried out in collaboration with Indian Meteorological Department of India. Keywords Climate prediction · Machine learning · Random forest regression · Support vector regression · Yield prediction
1 Introduction Enhancing quality and crop yield production while reducing operating costs is a target in agriculture. The yield depends on many different attributes such as the soil properties, weather, and irrigation. Data with high velocities form large volumes of datasets which are underused. These datasets when combined with the emerging machine V. Hanuman · K. V. Pinnamaneni (B) · T. Singh Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India e-mail: [email protected] V. Hanuman e-mail: [email protected] T. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_45
457
458
V. Hanuman et al.
learning technologies (which aid in finding patterns and rules in large datasets), better decisions are made. Numerous researches are being conducted in the field of agriculture. Many of them incorporate Machine Learning techniques in different ways which support and predict yield. The support includes climate prediction.
1.1 Related Work The climate can be predicted in various ways like Sreehari’s et al. [1] by simply taking linear or multiple linear regression to predict rainfall and the most accurate one is selected or a method like Prashant’s et al. [2] where 7 days: 1-day prediction is made using K-Medoids and Naive Bayes. Initial condition Support Vector Machine (SVM) by Wang Deji et al. [3] where the initial climate condition is given to predict upcoming conditions, can also be used. The advantage of SVM is that it can handle noises and fluctuations in the dataset. The study could be more complex like Sekar’s et al. [4] with an ensemble classifier and service-oriented architecture. There exists other research works of Petre et al. [5] which use algorithms like decision tree classification for climate classification on limited dataset. Machine learning is combined with advances sensing techniques by Pantazi et al. [6] for wheat yield prediction, whereas the advanced subset of machine learning like Artificial Neural Networks and Deep learning was combined with sensing data by Kentaro et al. [7, 8] for the yield estimation of crops. In the present scenario this crop selection based on crop yield is still in research stage and the practical application is very much minimal [9, 10]. In majority of the countries the selection is done by the farmers based on their experience and market requirements [11]. The Yield also depends on soil factors which are considered here in this study to estimate yield. This study deals with climate prediction and yield estimation, where the prediction is done in 1 day: 1-day method, which will incorporate machine learning algorithms for both climate and yield.
2 Data Acquisition and Algorithms The datasets used in this study were acquired from authentic sources for the city of Tiruchirappalli, Tamil Nadu, India. The authors of this paper have taken prior permission to use the data for research in this entitled project.
2.1 Climate Data Climate data were obtained from the National Data Centre, Climate Research and Services, Indian Meteorological Department (IMD), Pune, India [12]. The obtained dataset consisted of thirty-two factors such as maximum temperature, minimum temperature, wind speed, rainfall, etc. This dataset was feature extracted and brought
Best Fit Radial Kernel Support Vector Machine …
459
down to nine attributes namely [13]; date, month, year, maximum temperature, minimum temperature, wind speed, rainfall, rainfall hours, and rainfall minutes [14]. The dataset was then preprocessed to fill the voids in the dataset by predicting the values using Random forest Regression and Support Vector Regression, results of which were averaged and then replaced. The dataset was huge with upwards of 13,000 entries, from the year 1980–2017.
2.2 Soil and Yield Data Soil and yield data were obtained from the Joint Director of Agriculture office, Tiruchirappalli, India. The dataset of soil contained features like zinc, ph, iron, potassium, sodium, etc. The values of these were tabulated into two cycles. The first cycle contained data from the year 2015–2017 and the second cycle contained data from the year 2017–2019 (note that these values were averaged for every two year into one cycle). Just like the climate dataset, features were extracted from the soil dataset where the common values were removed or the differing values were only considered [15, 16]. The yield data were in terms of kilograms per hectare and these data were available only for three years (2016–2018).
2.3 Prediction and Estimation Algorithms 2.3.1
Support Vector Machine (SVM)
SVM is a supervised machine learning algorithm which is unique in terms of implementation. This algorithm constructs hyper planes in an iterative way for error minimization. The goal of SVM is to find the maximum marginal hyper plane by splitting datasets into classes in a multidimensional space. The advantage of this algorithm is its capability to deal with multiple categorical and continuous variables. Variants are achieved by changing Type (EPS/NU) for control over the selection number of support vectors in regression and Kernel (Linear/Polynomial/Radial) for dimensionality change. Metric
Best
Moderate
Poor
Time of SVM learning
Linear
Polynomial
Radial
Ability to fit any data
Radial
Polynomial
Linear
Risk of over fitting
Linear
Polynomial
Radial
Risk of under fitting
Radial
Polynomial
Linear
SVM Radial kernel function: 2 K (x, y) = −γ xi j − yi j γ here is a tuning parameter and x i represents the features.
(1)
460
V. Hanuman et al.
SVM Polynomial kernel function: b K (x, y) = a + x T y
(2)
b = degree of kernel and a = constant term.
2.3.2
Random Forest (RF)
RF is a technique capable of carrying out both regression and classification is used in this study. This algorithm is an ensemble technique and is an advancement of the decision tree. The base working is that a number of decision trees is formed and the output of all the trees is averaged and that average is given as the final output [17]. This algorithm can be modified to our will by deciding the number of decision trees the algorithm has to form before it gives out the result.
3 Intelligent Crop Yield Prediction Method (ICYPM) 3.1 Climate Prediction The algorithm was trained using 30 plus years of data. The working logic for climate prediction is that the previous day’s 9 climate variable values were entered and ran in an iterative way to predict each of the 6 attributes (excluding the date, month, and year) for the next day. The training and testing were done in the same way for both the algorithms with variations in it. Random forest regression was used with 3 variations with respect to number of trees; 20, 50, and 100. The Support Vector Machine was used with multiple variations with respect to Type; EPS and NU, with respect to category in kernel; radial, polynomial, linear, and sigmoid. KNN was combined with RF and SVM in few cases by filtering similar records like the test data and then training is given with the filtered data and then prediction was done. Refer Fig. 1. Climate prediction flow diagram for the diagrammatical understanding of the prediction working. First, the preprocessing was done on the acquired dataset to avoid holes. These holes themselves were filled by prediction method and averaging method. A preprocessed dataset was then ready with no missing values. Then, this dataset was inputted into the prediction algorithms to predict future climate in an iterative way for each value and the predicted values were entered into the dataset (adding date, month, and year) as a new record. This was done in a loop for required number of days for which climate values had to be predicted.
Best Fit Radial Kernel Support Vector Machine …
Dataset Climate Data Preprcessing Data Cleaning
Structured data
Feature Extraction Min.Temperature Max.Temperature Avg. Wind Speed Rainfall Sun Shine Evaporation Moisture Turbudity
461
Feature Selection Min. Temperature Max. Temperature Avg. Wind Speed Rainfall
Date Added
Random Forest Regression
Averaging
Prediction Model Framework Predicted Output for the Next Day
Predicted Output
Support Vector Machine
Random Forest Backward Feature
Feature Reduction Technique
Raw data
Predicting Missing Values
Regression Model
Regression to Predict Future Climate Values
No missing data
Complete Dataset
Fig. 1 Climate prediction flow diagram
3.2 Yield Estimation As the yield data were only available for 3 years, the goal was to identify the most suitable algorithm that could understand and find patterns with minimum training. The average climate, soil, and the yield for a specific crop in a particular year were tabulated for yield training and estimation. The training was done by giving input of soil and average climate for the crop duration. Here, yield was the estimating dependent variable. In Fig. 2 the working for yield estimation is seen through a flow diagram. The soil data was feature extracted and the average of climate data was combined with the soil and yield data for the corresponding year and a new dataset was formed for estimating yield. Support Vector Machine in NU type and Linear Kernel were
462
V. Hanuman et al.
Feature Extraction
Soil Data Acquisition Indian Meteorological Data climate data (average climate 3 year period) 2016-2017 2017-2018 2018-2019
Additional Climate data
Soil Micronutrients. Calcium (Ca), Magnesium (Mg) and. Sulfur (SO4).Lime.Boron (B), Chlorine (Cl), Copper (Cu), Iron. (Fe), Manganese (Mn), and Zinc Evaluation Parameter MSE RMSE
Satellite, climate index, Indian Meteorological Data reanalysis Courtsey : https://www.cli mate.gov/
Climate data with predicted value
Training a model and predicting the output with test data Statistical Analysis KNN Technique Random Forest Tecnique Support Vector Machine Type: NURegression Best Fit Radial kernel Support Vector Classifier
Fig. 2 Yield estimation flow diagram
used for regression. The test data was also generated in the same way by combining average climate and soil data for test input.
3.3 Evaluation 3.3.1
Mse
Mean Squared Error (MSE) is an evaluation method for regression problems. It can be defined as the averaged sum of squared difference of actual output and predicted output of every test sample. Lower the MSE, more fitting the regression line is to the dataset dependent variable. n MSE =
i=1 (Pi
n
− O i )2
(3)
Pi —Predicted output of ith test sample, Oi —Actual Output of ith test sample, n—Total number of test samples.
Best Fit Radial Kernel Support Vector Machine …
3.3.2
463
Rmse
Root Mean Squared Error (RMSE) is one of the methods used to evaluate the performance of a regression model. The RMSE can be defined as the square root of the averaged sum of squared difference of actual output and predicted output of every test sample. Lower the RMSE, higher the accuracy of the regression algorithm as RMSE measures the error in the model by comparing the deviation of the regression line from the actual output. The RMSE is the average distance of a data point from the fitted line, measured along a vertical line. RMSE =
n
i=1 (Pi
n
− O i )2
(4)
Pi —Predicted output of ith test sample, Oi —Actual Output of ith test sample. n—Total number of test samples.
4 Result and Discussions 4.1 Climate Prediction The variations to each algorithm were implemented to predict all the climate variables. The RMSE of RFR 20, 50, and 100 trees were 5.0611, 5.01587, and 4.6911, and the MSE value for the same was 25.6153, 25.159, and 22.0065 respectively. The MSE and RMSE of SVM with radial kernel was 12.7125 and 3.5654 and as of the other variants, it gave negative values which gave higher error values and KNN filtered dataset tested with same SVM with radial kernel also didn’t produce much efficiency as the plain SVM radial kernel. SVM with radial kernel fared the best as it had the least error. And hence, it was the best algorithm to predict the climate. Table 1 shows the comparisons of different algorithms in the prediction of climate variables. MSE and RMSE was used to compare different algorithms in terms of working efficiency. It was observed that the MSE and RMSE of Support Vector Machine was less when compared to the other variants of Random Forest Regression for all individual variables and for the total models. The known fact is that the value of MSE and RMSE defines how much error the algorithm makes while predicting. Lower the MSE and RMSE value, higher the correctness or the accuracy of the model. Figure 3 gives the Graphical representation of the above.
464
V. Hanuman et al.
Table 1 Climate prediction algorithm comparison table Models
P1
P2
P3
P4
P5
P6
P7
P8
A1
15.61
3.95
4.04
2.01
5.97
2.44
25.62
5.06
A2
14.72
3.84
3.93
1.98
6.51
2.55
25.16
5.02
A3
12.82
3.58
3.49
1.87
5.70
2.38
22.10
4.69
A4
21.05
4.59
5.97
2.44
5.61
2.37
32.63
5.71
A5
21.16
4.60
6.08
2.47
5.58
2.36
32.82
5.73
A6
4.11
2.03
3.03
1.74
5.58
2.36
12.71
3.57
A7a
–
–
–
–
–
–
–
–
A8a
–
–
–
–
–
–
–
–
A9b
Null
Null
Null
Null
Null
null
null
null
A10b
Null
Null
Null
Null
Null
Null
Null
Null
A11b
Null
Null
Null
Null
Null
Null
Null
Null
Abbreviations of Models A1—Random forest 20 Trees A2—Random forest50 Trees A3—Random forest 100 Trees A4—KNN + Random forest 150 Trees A5—KNN + Support Vector Machine EPS-Radial A6—Support Vector Machine EPS-Radial A7—Random forest 200 Trees A8—Random forest 500 Trees A9—Support Vector Machine Nu-linear A10—Support Vector Machine Nu-Radial A11—Support Vector Machine Nu-Polynomial Parameters P1—Maximum Temp MSE P2—Maximum Temp RMSE P3—Minimum Temp MSE P4—Minimum Temp RMSE P5—Average wind speed MSE P6—Average wind speed RMSE P7—Total MSE for Model P8—Total RMSE of Model a Model has exceeded running time b Model produced negative values and 100% error
4.2 Yield Estimation The deciding factor for choosing the suitable algorithm was the algorithm which can be trained well with minimum training data. Due to the working of the algorithms, Random Forest Regression failed and Support machine Vector (type—NU and kernel—linear) succeeded. RFR is known for working well in medium and large datasets. In this case, the dataset and the attribute counts were quite low, because of which even when the number of trees was increased, limitation of data was a problem.
RMSE
Best Fit Radial Kernel Support Vector Machine …
465
35 30 25 20 15 10 5 0 Random Forest 20 Trees
Random Forest 50 Trees
KNN + Random KNN + SVM Forest 100 Random EPS Radial Forest 150 Trees Trees
Maximum Temprature MSE
Maximum Temprature RMSE
Minimum Temprature MSE
Minimum Temprature RMSE
Average Wind Speed MSE
Average Wind Speed RMSE
Total MSE of Model
Total RMSE of Model
SVM EPS Radial
Fig. 3 Climate prediction algorithm comparison graph
Therefore, the result was either constant or average of the available yields. But this shortcoming was nullified in SVM as SVM forms a hyper plane in n-dimensional space for maximum marginal hyper plane with the available data. So, it understands the pattern which in turn knows correctly where to increase and decrease the estimate value which is beyond the training yield values. Even though the yield values were not accurate enough (due to lack of data for training), the estimated values changed correctly by increasing and decreasing the estimated values.
5 Conclusion This study focused on the estimation of yield by predicting the upcoming climate with the help of machine learning algorithms like RFR and SVM. RFR and SVM with variants were tested in predicting the climate and estimating the yield. RMSE was used for validating the regression models as it is the standard metric for comparing different algorithm’s working and the following were done: • It was found that SVM in EPS type and Radial Kernel was the best algorithm for climate prediction with RMSE of 3.5654. • In Yield estimation, the target was to identify the algorithm that was best trained with minimum training data. • SVM in NU type and Linear kernel was identified best suited for yield estimation as other variations of SVM and RFR gave constant values. Linear kernel is not capable of computing variety of data with multiple parameters. Hence the results show null factor. Here, data is not linearly separable. The main feature of radial kernel is to transform the not linearly separable data into a higher
466
V. Hanuman et al.
dimensional space. The polynomial kernel model requires the training data to be high dimensional, which makes our data under fitting for this model and makes it a poor performance metric. Radial kernel is the best algorithm. The study can be enhanced with addition of yield and soil data to the dataset for more rigorous training of the algorithm resulting in accurate estimation. The usage of neural networks may result in an increase of estimated accuracy. Acknowledgements For acquisition of the data, Amrita Vishwa Vidhyapeetham, Bengaluru had to undergo a Memorandum of understanding with the Indian Metrological Department, Pune, India and Joint Director Agriculture Office, Trichy regarding the usage of data for research purposes.
References 1. E. Sreehari, S. Srivastava, Prediction of climate variable using multiple linear regression, in 2018 4th International Conference on Computing Communication and Automation (ICCCA), (Noida, India, 2018). doi:https://doi.org/10.1109/CCAA.2018.8777452 2. P. Biradar, S. Ansari, Y. Paradkar, S. Lohiya, Weather prediction using data mining, in 2017 Int. J. Eng. Dev. Res. (IJEDR) 5(2) (2017). ISSN: 2321–9939 3. W. Deji, X. Bo, Z. Faquan, J. Li, G. Li, S. Bingyu, Climate prediction by SVM based on initial conditions, in 2009 6th International Conference on Fuzzy Systems and Knowledge Discovery (Tianjin, China, 2009), pp. 578–581. doi:https://doi.org/10.1109/FSKD.2009.566 4. K.R. Sekar, J. Sethuraman, M. Srinivasan, K.S. Ravichandran, R. Manikandan, Concurrent classifier based analysis for climate prediction using service oriented architectures, in 2017 International Conference on Networks and Advances in Computational Technologies (NetACT) (Thiruvananthapuram, India, 2017), pp. 370–375. doi:https://doi.org/10.1109/NETACT.2017. 8076798 5. E.G. Petre, A decision tree for weather prediction. BMIF J. Math. Inf. Phys. Ser. Bull. PG Univ. Ploiesti Rom. 61(1), 77–82 (2009) 6. X.E. Pantazi, D. Moshou, T. Alexandridis, R.L. Whetton, A.M. Mouazen, Wheat yield prediction using machine learning and advanced sensing techniques. Comput. Electron. Agricul. 121, 57–65 (2016). https://doi.org/10.1016/j.compag.2015.11.2018 7. K. Kuwata, R. Shibasaki, Estimating crop yields with deep learning and remotely sensed data, in 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (Milan, Italy, 2015). pp. 858–861. doi:https://doi.org/10.1109/IGARSS.2015.7325900 8. T. Singh, T. Babu, Fractal image processing and analysis for compression of hyperspectral images, in The 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (Kanpur, India, 2019). doi: https://doi.org/10.1109/ICCCNT45670. 2019.8944503 9. R.J. McQueen, S.R. Garner, C.G. Nevill-Manning, I.H. Witten, Applying machine learning to agricultural data. Comput. Electron. Agricul. 12(2), 275–293 (1995). https://doi.org/10.1016/ 0168-1699(95)98601-9 10. A. Chlingaryan, S. Sukkarieh, B. Whelan, Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: a review. Comput. Electron. Agricul. 151, 61–69 (2018). https://doi.org/10.1016/j.compag.2018.05.012 11. R.R. Nair, T. Singh, Multi-sensor, multi- modal medical image fusion for color images: a multiresolution approach, in The Tenth International Conference on Advanced Computing (ICoAC) (Chennai, India, 2018). doi: https://doi.org/10.1109/ICoAC44903.2018.8939112 12. Indian Meteorological Department Information. www.imdpune.gov.in/
Best Fit Radial Kernel Support Vector Machine …
467
13. B. Safa, A. Khalili, M. Teshnehlab, A. Liaghat, Artificial neural networks application to predict wheat yield using climatic data, in 20th International Conference on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography and Hydrology, at 84th American Meteorological Society (AMS) Annual Meeting (Seattle, USA, 2004) 14. R. Pugazendi, P. Usha, A hybrid model of K-means clustering and multilayer perceptron for rainfall. J. Comput. Intell. Syst. 36–40 (2017) 15. J. Liu, C.E. Goering, L. Tian, A neural network for setting target corn yields. Trans. Am. Soc. Agricult. Soc. Agricult. Biol. Eng. (ASAE) 44, 705–713 (2001). doi:https://doi.org/10.13031/ 2013.6097 16. M. Kaul, R.L. Hill, C. Walthall, Artificial neural networks for corn and soybean yield prediction. Agricult. Syst. 85(1), 1–18 (2005). https://doi.org/10.1016/j.agsy.2004.07.009 17. T. Singh, R. Jha, R. Nayar, Mammogram classification using multinomial logistic regression, in The International Conference on Communication and Signal Processing (ICCSP) (Melmaruvathur, Tamilnadu, India, 2017)
Design of Cryptographic Algorithm Based on a Pair of Reversible Cellular Automata Rules Surendra Kumar Nanda, Suneeta Mohanty, and Prasant Kumar Pattnaik
Abstract This paper focuses on designing a cryptographic algorithm which is based on a pair of reversible complement cellular automata rules that include a pair of reversible cellular automata rules, a random number, and a shared key. During decryption, the identical shared key, ciphertext, encrypted ciphertext and complement reversible cellular automata rule will be used to get the plain text. The algorithm may efficiently work on parallel processing systems thanks to its simpler hardware implementation and parallel computation capability and shows satisfactory avalanche property and also the mode of operation is that the same as CBC mode of the block cipher. Keywords Cryptography · Cellular automata (CA) · Cipher block chaining (CBC) mode · Reversible cellular automata (RCA)
1 Introduction Security of computer systems and security of the information flowing in the network is a high priority requirement [1]. Cryptography is the strongest tool for controlling various security threats [2, 3]. Asymmetric key (public key) and symmetric key (private key) are two broad classifications of cryptographic techniques. Symmetrickey encryption [4] process is classified into two types i.e. block ciphers and stream ciphers [2]. Each plain text divides into several fixed-size blocks and encrypts them block by block then it is known as a block cipher. However, bits by bits or byte
S. K. Nanda (B) · S. Mohanty · P. K. Pattnaik School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, India e-mail: [email protected] S. Mohanty e-mail: [email protected] P. K. Pattnaik e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_46
469
470
S. K. Nanda et al.
by byte are considered as a stream cipher. Shannon introduces the terms confusion and diffusion which are two elementary of any cryptographic system. Guan [3] proposed cellular automation based public cipher, where the security of the algorithm depended on solving a system of the nonlinear polynomial equation which was difficult to solve. Wolfram [4, 5] was first to propose a cellular automata-based stream cipher encryption algorithm. He uses cellular automata as a pseudo-random number generator. Subsequently, Seredynski et al. [6, 7] have developed the algorithm. The works by Roy [17] and Mukhopadhyay [8] explain the Application of Cellular Automata in Symmetric Key Cryptography & generation of expander graph. Das and Ray [9] present a reversible programmable cellular automata-based block cipher parallel encryption algorithm. Other notable research works represent the application of cellular automata in different areas [8, 10, 11]. Das [12] present application of Rule 32 and 2 for design of cryptosystem. Naskar [13] and Umira [14] represents application of cellular automata in image encryption and quantum dot, respectively. In this work, we have proposed the design of a block cipher cryptographic algorithm based on reversible complement cellular automata rules. The work is organized as follows; Sect. 2 provides theoretical foundations of the Cellular Automation system. In Sect. 3 we have presented our proposed work which is a detailed description of the design of a cryptographic algorithm based on reversible complement cellular automata rules. Section 4 is the discussion on the results obtained. Lastly, Sect. 5 concludes the work with its future scope.
2 Theoretical Foundations of Cellular Automata Wolfram [4, 4] for the first time focus on cellular automata. Ulam and Von Neumann for the first time proposed the fundamental concept of cellular automata. The simplicity, ease of implementation and the simple structure of cellular automata attracted a large number of researchers from various disciplines to work on cellular automata. Cellular automata went through extensive and rigorous mathematical and physical analysis for a very long period. The application of cellular automata has been explained by various researchers in different branches of science [15, 16]. Cellular automata are extremely popular because of its simplicity of implementation and the potential for modeling highly complex systems. The power of cellular automata is that it is best suited for a parallel processing environment. In cellular automata, an individual element known as a cell, update its value depended on its surrounding cells. In cellular automata, cells are communicated with other cells which are local to each other. An individual cell changes its states over a period of time based on the state of other cells present as its neighbor. However, when we iterated these simple structures several times, produce complex patterns. This property of cellular automata gives us the facility to simulate different complex systems. In our work, we use cellular automata to design an algorithm for both encryption and decryption of plain text to cipher text and vice-versa. We choose a class of cellular automata rules
Design of Cryptographic Algorithm Based …
471
Fig. 1 One-dimensional automation with neighborhood to show implementation of rule 90
which are cryptographically very strong but simple to implement. We choose a pair of reversible rules.
2.1 One-Dimensional Cellular Automation A one-dimensional cellular automaton is a collection of cells arranged in a horizontal line of finite size. Each cell interacted with other cells which are known as neighborhood cells in a discrete-time t. For each central cell i, we compute its value based on a rule and a neighborhood size i.e. radius (r). If we consider a neighborhood of radius r then it consists of total 2r + 1 cells, including the central cell i. In one dimensional cellular automaton, the radius is considered as one. If the radius is one then we consider one cell left to central cell, the central cell and one cell right to the central cell. Figure 1 shows the transition of the state of cellular automata of finite size with cyclic boundary. While calculating the state of a central cell i at time t + 1 it depends only on states of its neighborhoods at time t and a transition function which changes the state, known as a rule. Wolfram [4] took the lead in generating pseudo-random number system by applying cellular automata rules. To generate pseudo-random number Wolfram uses one-dimensional cellular automata rule 30 with radius (r) = 1. One-dimensional cellular automata (circular boundary) with radius 1 and rule 90 is represented in Fig. 1.
2.2 Reversible Cellular Automata Wolfram [4] proposed elementary cellular automata which are one-dimensional finite cellular automata having two states i.e. either state 0 or state 1. The neighborhood is consisting of the central cell itself and r numbers of cells toward both left and right. Here, r is termed as the radius. With radius r we can form 2n numbers of rules, where
472
S. K. Nanda et al.
Fig. 2 Forward/backward iteration
n = 22*r+1 . So, there are 256 cellular automata rules for radius 1. Out of all possible rules some rules have reversible property. The reversible rules allow us to go back to the initial state of a cellular automaton at any point in time. The reversible rules are a pair of rules where one rule is used for forwarding movement and the other rule is used for backward movement. The reversible pair of rules is very efficient for cryptography because we can use one rule for encryption and another rule for decryption. The basic idea of forwarding and backward iteration in cryptography is shown in Fig. 2. The plaintext may encode as the initial state of a cellular automaton. We can perform forward iteration of the cellular automation to achieve encryption. We can perform a backward iteration of cellular automation to achieve decryption. In this work, we use the class described by Wolfram [4] for encryption. The class of rule that we use depends on the two-step backward iteration. Now cell just has to look not only at its right and left neighbors in step t (sequence of current cells value) but also it should check its value in step t − 1 (sequence of cells value in the previous stage). This means that the two cases must be considered while creating a new rule. The first one should define the transition in the state if step t − 1 cell was in state 1 and the second one when the cell was in state 0. An example of a definition of such a rule is given in Fig. 3. Here, if we consider the first three bits in step t (111) and the central cell value of step t − 1 is 1 then the outcome of rule 75 is 0. This is because the binary equivalent of 111 is 7 and the 7th bit position of binary equivalent of 75 is 0 (if bit position starts with 0). Each rule that belongs to this class can be characterized by a pair of elementary cellular automata rules. The first cellular automata rule is used for transition in the
Fig. 3 A pair of complement Rule 75/180
Design of Cryptographic Algorithm Based …
473
state if step t − 1 cell was in state 1 (Case 1), and the second cellular automata rule is used when the cell was in state 0 (Case 2). These two rules depend on each other. If one rule is known then the second rule can be calculated using: R2 = 2n −R1−1 where n = 22r +1
(1)
To create a new reversible rule, we can take any elementary rule and apply it in the case when cells value in step t − 1 was 1 (Case 1). In order to seek a rule for the case in which cells value in step t − 1 was 0 (Case 2) presented formula can be used.
3 Proposed Method The proposed algorithm is built on the principle of private key cryptography. The private keys and the RCA rules used in this algorithm are mutually shared by the sender and receiver before data transfer. This algorithm needs a random number at the beginning of encryption and that can be generated by using an efficient random number generator. The sender at the end of the encryption stage generates Encrypted Cipher Text (ECT) and Encrypted Final Data (EFD) by using shared keys and selected RCA rules. The generated ECT and EFD then send to the receiver. The receiver after receiving EFD and ECT applies shared keys and compliment rules of earlier used RCA rules to get the plaintext. During the encryption and decryption process, we use three keys i.e. private key1, private key2, and encrypted key. The sender and receiver already shared private key1 and private key2 among themselves. Figure 4 represents the process of encrypted key calculation. The Encrypted Key (EKey) is used to encode the encrypted data generated after each block of encryption. This will helps us to protect the propagation of cryptoanalysis of one block from other blocks. Initially, the plaintext is divided into multiple small blocks (128 bits). Before applying multiple rounds on encryption, we apply a private key (Private Key1) on the plaintext block to generate an early encrypted plaintext. This will prevent the plaintext to directly involving in multiple rounds on encryption, which makes cryptanalysis more difficult. Each encrypted plaintext block is subjected to N − 1 iterations to produce the corresponding ciphertext. Each ciphertext is then encrypted with another private key (Private Key2) to produce encrypted ciphertext. Finally, we combine all encrypted ciphertext to produce the desired ciphertext. The decryption is just the reverse steps of encryption with the reverse of the rules used in the encryption process. Fig. 4 Generation of encryption key
Private Key1 RCA Rule Private Key2
Encryption Key
474
S. K. Nanda et al.
3.1 Encryption Algorithm The encryption algorithm used in this work and steps are described below: Input: Plain Text and Random Number. Output: Encrypted Cipher Text and Encrypted Final Data. Step-1 Generate a Random number of 128 bits, (State 0). Step-2 Choose A plain text & divide into different block size having 128 bits, (State 1). Step-3 Choose two private keys i.e. private key1 and private key2. Step-4 XOR each block of plain text with private key1 to generate Encrypted Plain Text (EPT). Step-5 Generate the Encrypted Key (EKey) by applying a pair of RCA rules on private key1 and private key2. Step-6 Choose a pair of RCA rules to apply to each EPT. Step-7 Find State 2 by applying the RCA Rule on EPT. Step-8 Repeat Step-7 for N times. Step-9 Select the output of N-1th Iterations as FD and N-2th as CT. Step-10 Perform XOR operation of CT with private key2 to generate Encrypted Cipher Text (ECT). Step-11 Perform XOR operation of FD with Encrypted Key (EKey) to generate Encrypted Final Data (EFD). This act as State 0 for the next plain text block. Step-12 End.
3.2 Decryption Algorithm The encryption algorithm used in this work and steps are described below: Input: Encrypted Cipher Text and Encrypted Final Data Output: Plain Text Step-1 Generate the Encrypted Key (EKey) by applying a pair of RCA rules on private key1 and private key2. Step-2 Perform the XOR operation of EFD of the previous block and Encrypted Key to get the FD. It will act as State 0. Step-3 Perform the XOR operation of ECT and Private Key2 to get the CT. It will act as State 1. Step-4 Find State 2 by applying the RCA Rule on Each CT. Step-5 Repeat Step-3 for N times. Step-6 Select the output of N-1th Iterations as EFD for the next block and N-1th as Encrypted Plain Text (EPT). Step-7 Perform XOR operation of EPT with private key1 to get the desired plaintext. Step-8 End
Design of Cryptographic Algorithm Based …
475
4 Result Discussion In this section, we have implemented the proposed algorithm to test its accuracy and efficiency. Initially, we generate a random number of 128 bits lengths. We accept a plaintext from the user and divide it into the number of blocks, having a block size of 128 bits. Each block of plain text is XOR with private key1 to generate encrypted plain text. Then we apply a cellular automata rule which is approved by both sender and receiver on the individual blocks of plain text. During the implementation, we choose a pair of reversible cellular automata rules that is 75 and 180 for encryption of each block of plain text to ciphertext. Repeat these above steps for a specific number of iterations which is mutually known by both sender and receiver. The N − 2 iterations output is taken as the ciphertext (CT) and N − 1 iterations output is taken as the final data (FD). Then perform the XOR operation between FD and encrypted key of 128 bits to generate the encrypted final data (EFD) which act as State 0 for the next block. After this perform XOR operation of the private key2 with the CT to generate encrypted ciphertext (ECT). Now, transmit the ECT and EFD to the receiver. The encrypted key can be generated by applying RCA rule 236/19 on private key1 and private key2. The number round to apply CA rule is chosen as 1 during the encrypted key generation whereas during encryption of data we apply CA rule for N number of rounds. The receiver performs the XOR operation between the EFD and the encrypted key to generate the FD. Perform the XOR operation of private key2 with ECT to generate CT. Now, apply reversible CA rules 75/180 for N times on the CT. The N − 1 iteration gives us an encrypted plain text (PT). Perform the XOR operation of encrypted plain text with private key1 to get the plain text. Figure 5 demonstrates the above-said process with 8 bits plain text, random number, and key. Initially, we generate a random number (00,011,011) as state t − 1, our Encrypted Plaintext (01,110,011) as state t, and the RCA rule 75. After applying rule 75 on state t we will get the state t + 1 as A2 (10,110,011). This process will continue for 7 rounds and finally we will get cipher text (CT (A7)) and final data (FD). The FD will use as initial configuration for the next block in place random number. Figure 5 is just the reverse process of Encryption.
Fig. 5 Encryption with 8 bits data and Decryption with 8 bits of data
476
S. K. Nanda et al.
Our algorithm shows good avalanche property. The avalanche property says that a very small modification in the plain text bits or key bits should show a great significant modification in the outcome bits i.e. ciphertext bits. The more ciphertext bits it affects, the more will avalanche effect. Our reversible cellular automata-based algorithm having a working policy similar to Cipher Block Chaining (CBC) mode of block encryption in terms of production of results.
5 Conclusion and Future Work Reversible property of cellular automata and encryption policy is widely used in block encryption. The proposed algorithm shows good avalanche property and the mode of operation is similar to CBC mode of the block cipher. The cellular automata are so simple that it can be implemented using very simple circuits. As our algorithm is based on reversible cellular automata it is very simple to implement. In our algorithm, we use a key size of 128 bits. The 128 bits key size increases security and makes brute force attack extremely difficult. We implement this algorithm and verify the results with the expected results. This algorithm utilizes reversible cellular automata because reversible cellular automata are efficient to implement in a parallel processing system, simple to implement and very economical in cost. In future, we want to design a hybrid block cipher algorithm by implementing hybrid cellular automata rules. We also wish to choose the cellular automata rule pair dynamically. All RCA rules cannot be used in cryptography so we need to find out all strong pairs of rules that can be used effectively in cryptography.
References 1. S. Nandi, B. K. Kar, P. Pal Chaudhuri, Theory and applications of cellular automata in cryptography. IEEE Trans. Comput. 43(12), 1346–1357 (1994) 2. C.E. Shannon, Communication theory of secrecy systems. Bell Syst. Tech. J. 28(4), 656–715 (1949) 3. P.R. Rio, Cellular automaton public-key cryptosystem. Complex Syst. 1, 51–57 (1987) 4. F.-X. Standaert, G. Piret, G. Rouvroy, J.-J. Quisquater, J.-D. Legat, ICEBERG : An Involutional Cipher Efficient for Block Encryption in Reconfigurable Hardware. (Springer, Berlin, Heidelberg, 2004), pp. 279–298 5. M. Tomassini, M. Perrenoud, Cryptography with cellular automata. Appl. SoftComput. 1(2), 151–160 (2001) 6. M. Szaban, F. Seredynski, Improving quality of DES S-boxes by cellular automata-based S-boxes. J. Supercomput. 57(2), 216–226 (2011) 7. M. Szaban, F. Seredynski, Cellular Automata-Based S-Boxes vs. DES S-Boxes. (Springer, Berlin, Heidelberg, 2009), pp. 269–283 8. D. Mukhopadhyay, Generating Expander Graphs Using Cellular Automata (Springer, Berlin, Heidelberg, 2012), pp. 52–62 9. D. Das, A. Ray, A parallel encryption algorithm for block ciphers based on reversible programmable cellular automata. J. Comput. Sci. Eng. 1(1), 82–90 (2010)
Design of Cryptographic Algorithm Based …
477
10. A. Jaberi, R. Ayanzadeh, A.S.Z. Mousavi, Two-layer cellular automata based cryptography. Trends Appl. Sci. Res. 7, 68–77 (2012) 11. P. Anghelescu, S. Ionita, E. Sofron, Block encryption using hybrid additive cellular automata, in 7th International Conference on Hybrid Intelligent Systems (HIS 2007), (2007), pp. 132–137 12. M. Das, K. R. Das, M. Sahu, R. Dash, Application of cellular automata for an efficient symmetric key cryptosystem, in 2019 International Conference on Applied Machine Learning (ICAML) (Bhubaneswar, India, 2019), pp. 21–26 13. P.K. Naskar, S. Bhattacharyya, D. Nandy et al., A robust image encryption scheme using chaotic tent map and cellular automata. Nonlinear Dyn 100, 2877–2898 (2020) 14. S. Umira, R. Qadri, Z.A. Bangi, M. Tariq Banday, G. Mohiuddin Bhat, M. Rafiq Beigh, A novel comparator—a cryptographic design in quantum dot cellular automata. Int. J. Digit. Sign. Smart Syst. (IJDSSS) 4(1–3) (2020) 15. H. Umeo, How to synchronize cellular automata—recent developments, 393–419 (2020) 16. P. Ratha, D. Swain, B. Paikaray, S. Sahoo, An optimized encryption technique using an arbitrary matrix with probabilistic encryption. Procedia Comput. Sci. 57, 1235–1241. ISSN 1877-0509 (2015) 17. S. Roy, S. Nandi, J. Dansana, P.K. Pattnaik, Application of cellular automata in symmetric key cryptography, in International Conference on Communication and Signal Processing (2014), pp. 572–576
Computational Model Simulation of a Self-Driving Car by the MADRaS Simulator Using Keras Aseem Patil
Abstract The identification of pathways is a complicated subject. For several generations, it has been drawing the concern of the visual world. Lane detection is essentially a multifunction detection problem, which has become a significant challenge for computer vision and machine learning. Although many machine learning methods are used to detect lane, they are mainly used to classify rather than design features. Nevertheless, modern machine learning approaches can be employed to find the familiar features that have been effective in feature detection tests. Problems remain unresolved when there is an obstruction on the road while driving in the self-driving cars sector. Lane detection is the most important part for the reduction of injuries and damage when driving self-driving vehicles. In this paper, an implementation of a simulation of a self-driving car is made by using MATLAB functions for training/testing and building a model using Keras along with the open-source Intel’s OpenAI API MADRaS simulator (Multi-Agent Autonomous Driving Simulator) has been shown. Keywords Canny edge detector · Lane detection · HoG descriptors · Sobel-feldsman filter · Hough transform · Visualization
1 Introduction Traffic safety becomes a top priority because of the increase in the number of vehicles and the huge traffic in urban areas. In India, road accidents lead to 5 deaths every 15 min according to a recent survey. Road accidents in the country have been found every minute and there are 20 accidents per hour, and almost 4 lakh plus road accidents have been reported by the states. It has been found that the overall road accident deaths with two wheelers account for 38.48%, considering both open road and urban factors that contribute to these accidents. When society develops rapidly, cars have become a significant mode of transport for people. More and more vehicles of all sorts are A. Patil (B) Department of Electronics Engineering, Vishwakarma Institute of Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_47
479
480
A. Patil
on the narrow road. The number of victims of car accidents is rising each year, as more and more motor vehicles are on the roads. It has become a focus of attention on how to drive safely under many vehicles and narrow roads. Lane detection has been used for autonomous vehicle systems and is a hot topic in the fields of machine learning and computer vision. The lane detection system is based on lane markings in a complex environment and is used reliably to estimate the position and trajectory of the vehicle relative to the lane. In the lane warning system, lane detection also plays an important role. The research for lane detection is divided into two main steps: edge detection and line detection. The proposed method could achieve the new edge. A famous scientist suggested the use of the Sobel edge operator for the Adaptive Interest Area (ROI). But, despite edge detection, there are still some wrong edges. Such errors can impact the identification of the lane. The detection of the line is as critical as the detection of the lane edge. As far as line detection is concerned, we typically have two approaches, one based on feathering and other modeling methods. In this article, we propose a lane detection method which is suitable for self-driving simulation with all kinds of difficult traffic conditions. Next, each frame image was preprocessed and then the region of interest of the pictures processed was chosen. Finally, for the ROI area, we needed only edge detection and line detection to preprocess each of the the vehicle’s frame for testing. We introduced a new approach for preprocessing and ROI selection in this report. The identification of a lane begins with such a lane indicator and is used to determine the location of the vehicle. In the lane and self-driving shift warning system, lane detection also plays an important role. The identification requires identifying other key aspects such as road markings. Parked and moving vehicles are the main challenges, poor quality lane lines or lines without any, shadowing because of the trees, etc.
2 Approach in Solving the Problem 2.1 Feature-Based Detection Feature-based detection also known as the functional lane detection is based upon the extraction from the input image of the edges and lanes of the necessary functions. The model for feature detection is further divided into approaches for edge detection and Hough approaches based on transformation [1]. The function also detects the lane in the input picture through the borders. In this paper, I have suggested a model which detects the lines by the edge detection approach and takes the inserted image to a new image that depicts a bird’s eye view of the road and detects the lane marking (Fig. 1).
Computational Model Simulation of a Self-Driving Car …
481
Fig. 1 Using various aspects in determining the lane feature from a set of features
2.2 Noise Reduction Like all methodologies for edge detection, noise is a key issue, frequently leading to false detection. To reduce detector sensitivity to noise, a 5 × 5 Gaussian filter is often used to convolve the image. You use a kernel of normal distributed numbers that will run through the whole image (in the above scenario a 5 × 5 matrix), which sets every pixel to the weighted average of its next pixels [2] (Fig. 2). The implemented matrix kernels are the same in size, even if they vary in kernel values. Another angle: matrix kernels of the same scale, of which one is more extreme than the other. The weight factor used to measure the kernel values can be calculated by the matrix kernels’ resultant strength [3]. The kernels measured shall be of equal size and differ in strength of Gaussian blur intensity. The code allows the user to update the weight values, expressed as weights 1 and 2, to set kernel intensity. Fig. 2 A 5 × 5 Gaussian Kernel that will help in noise reduction from each frame from the live stream
482
A. Patil
2.3 Intensity Gradient Using the Angle and Edge Gradient The distorted frame is then modified to determine whether the boundaries are horizontal, vertical, or lateral with a Sobel, Roberts or Prewitt kernel. Edge Gradient(G) =
Angle(θ ) = tan−1
G 2x + G 2y Gy Gx
(1)
(2)
Here in Eqs. (1) and (2), G x = f (x + n, y) − f (x − n, y)
(3)
G y = f (x, y + n) − f (x, y − n)
(4)
n is usually a small integer, unity. For instance, to convert the following mask to the image data, aligning the mask with the x and y axes is the most conceptual model, used to compute the values Gx and Gy. Roberts, Prewitt and Sobel operators remained included in variations on this subject. The operator Sobel was the most widely utilized until it was widely acknowledged as having the feature of Gaussian and its derivatives [5]. In this situation, the masks are applied to 3 districts rather than 3–1 above-mentioned district. The corresponding x and y masks are combined with the image for the first time, to align and contribute to the values of Gx and Gy .
2.4 Vehicle Detection The aforementioned feature extraction techniques are quite great, especially while using HOG descriptors. HOG takes an image, breaks it into blocks, we observe the pixels and extract them from the characteristic vectors in cells [4]. The pixels inside the cell are divided into different orientations and the magnitude of the most powerful vector decides the resulting vector for a specific cell within a row (Fig. 3).
Fig. 3 The block diagram of the aforementioned feature extraction method using HoG descriptors
Computational Model Simulation of a Self-Driving Car …
483
Detection of vehicles is an important research subject that has been found in countless fields for industrial applications. Its main aim is to extract various features from monitoring traffic footage or photos in order to determine the category of vehicles and provide traffic monitoring and control reference information. In this paper, I propose to use HOG descriptors and the CNN technology to advance vehicles detection.
2.5 Lane Detection In developing intelligent vehicles, lane detection is an essential focus. A route detection algorithm for intelligent vehicles in complex road conditions and diverse conditions was proposed for finding solutions such as low prediction accuracy of conventional methods and weak real-time outputs of profound learning methodologies. Firstly, the aerial view of the lane is obtained by region of interest extraction and conversion of the inverse viewpoint using the overlay threshold methodology for edge detection. Secondly, the sample-based random consent algorithm for the curves of the lane lines based on the third order model basis-spline (a spline function which has minimum assistance for a specific degree, responsiveness, and field partition) was adopted. Fitting evaluation and measurement of the curve’s circumference was then performed. Finally, simulation tests for lane detection algorithms were conducted by using a road video under complicated driving conditions and a data set [1]. If by chance, the individual wanted to change lanes he/she would have to give an indicator that would allow the system to change paths. However, if the car wants to change the path when a car is driving at that same time on the path to be changed, the system will make sure that, that specific vehicle passes before it’s our turn to change paths, as seen in (Fig. 4). Using the MADRaS simulator we could determine whether the code was giving the required results. The pixel intensity changes quickly when there is a edge (a path) (that is, 0–255) we want to recognize [5]. To identify the road lines we use the following pipeline: 1. Convert one of the frames from the live stream to a grayscale image. 2. Set Gaussian blur onto the frame, generate a grainy image. 3. Apply the HoG descriptor on the blurred image and get a white-edged black image. 4. To eliminate all outlines far outside ROI, add a ROI mask to the boundaries of the image. 5. Apply the Sobel-Feldman filter to the masked image we got as the output from the Hough transformation in step 4. 6. List the output line markings after the filters are applied. 7. Transform horizontal and vertical line margins: (a) Disregard paths with trajectories of nearly 0 (path of −0.5–0.5). (b) Apply Hough to a masked margin. (c) The remaining outlines in both pairs are separated by the route mark.
484
A. Patil
Fig. 4 Simulation of changing lanes using the MADRaS simulator with the Keras model guidance
(d) To each faction average out the slopes and the intercepts. (e) Turn back the average trajectories for the left and right lane lines and intercepts. 8. Draw the original picture with normal lane lines within the region of interest. 9. Auto-save the result of the frame using save_framefile(). After the simulation is complete the car will keep its trajectory path constant unless the individual changes paths. The detection system works by providing information from previous frames but no past information exists at frame 0. Therefore, when it is implemented for the first time, the tracking system needs some initialization. Basically, it needs to think about where the route points can travel. The system proposed figures out whether there are lanes at the top and the middle of the ROI. Straight lines can and must be detected by rapidly identifying those sections, both curved sections can occur along the road. The proposed method detects straight and curved sections that the Hough transform estimates are lanes. The distance of left and right lanes is reduced to reduce the processor load when lanes are identified from the actual images (Figs. 5 and 6). To draw a line on the left and right routes, I used the function split_draw_lines() and edited the code by using the following points: 1. The Hough transformation lines are split by gradient (negative and positive) into two sections. Removal of inordinately curving lines (abject value below 0.4 and above 1.2). 2. The paths and intercepts for every group are combined. 3. For the median pitch and intercepts, the beginning and end points are determined, so that lines have been drawn only within ROI. 4. Add the following lines to the frame output image.
Computational Model Simulation of a Self-Driving Car …
485
Fig. 5 Car simulation using the trajectory planning in the MADRaS simulator
Fig. 6 Applying Sobel-Feldman Filter to the frame
3 Challenges While Implementing the System Major drawbacks while implementing the system would include the following two instances: 1. Identifying the optimal blur Canny edge Hough transformation parameters needed to be recognized.
486
A. Patil
Fig. 7 Tire marks on the road that act as a challenge during lane detection
2. Determining how to draw a line for the left and right lanes. We frequently ask ourselves while implementing such systems, what might happen if the car was driving outside of the road, would that be a major vulnerability? In this scenario (Fig. 7), both lines are on the same slope and my pipeline typically breaks down and draws only one line. One weakness may be irregular road surface color, when pneumatic markings are present on the ground. Through watching the ordinary live stream, you can see that when the car drives in the bright concrete patch with black pipes the current simulation pipeline example got confused. Several green Hough lines with very small slopes are visible. They “train” the average paths and intercept, so that the red lines on the sides get really far away. The third drawback is that sometimes a fraction of a second lines can vanish in the picture. The pipeline can not therefore distinguish the landlines for these images. It’s due to the blur Canny Hough transition parameters [6]. An error-less innovation would be the implementation of a system that would automatically select the best set of conditions for the pipeline. Another innovation would be to automatically adjust ROI depending on whether the trajectory is elevated or descending.
4 Results By using the MADRaS simulator, the system gave us the required output, as can be seen in Figs. 8 and 9.
Computational Model Simulation of a Self-Driving Car …
487
Fig. 8 Using Canny edge detectors and HoG descriptors we can identify the lane for the car
Fig. 9 The vehicles have been detected and the car is following its same path
The frame output image can be digitized for linear or slashed lines—lane markings. The individual will concentrate on the car between the two lines without changing its route (Figs. 10 and 11).
488
A. Patil
Fig. 10 The following graph represents the variation in threshold before and after the filters are applied. The blue line shows the original and normal state of data that was used in the simulation. The orange line shows the smoothed and cleaned version of the data from the image. Taken from the live stream video. The green line shows the variation during the simulation with filters and Hough transform
Fig. 11 The following graph illustrates the total loss of data during training and testing of the model on both live feed and simulation. Based on the graph we have achieved an accuracy of 94.77% from the simulation and 92.14% accuracy from the live feed data
5 Conclusions A self-driving car simulation has been built using convolutional neural networks using the MADRaS simulator. In order to enhance lane recognition accuracy, edge extraction was also implemented during the preprocessing stages. Following the
Computational Model Simulation of a Self-Driving Car …
489
suggested preprocessing, we also made the ROI collection. It reduces non-lane parameters and increases detection accuracy in comparison to the selection of the ROI in the original image. I have used Sobel-Feldman’s filters in drawing and mapping the lane and using HoG descriptors to evaluate the local regions present in each frame of the live feed. In doing so, an accuracy of 94.77% from the simulation and 92.14% accuracy from the live feed data has been achieved. Using Canny Edge detectors for establishing the lane regions using a 5 × 5 matrix as a convolutional mask, we could identify the lane and made sure that the car never changed paths at curved paths. Further, the system can change lanes only if the indicator has been turned on. I aim to improve our project’s efficiency in our future research with alternative standardization approaches. In extraordinary lighting conditions, we intend to focus on images taken. We plan to develop appropriate metrics to evaluate the model output accurately for the position of objects, especially for our nighttime data.
References 1. A.A. Assidiq, O.O. Khalifa, M.R. Islam, S. Khan, Real time lane detection for autonomous vehicles, in 2008 International Conference on Computer and Communication Engineering (IEEE, 2008), pp. 82–88 2. Z. Kim, Robust tracking in difficult situations and lane detection. Trans of IEEE. Intell. Intell. Sweetening Unit. System. Mar. 9(1), 16–26 (2008) 3. J. Kuhnl, A. Fritsch, A. Geiger, The new performance measures and evaluation benchmarks for algorithms for road detection, in The Proceedings of the 16th International IEEE Conference on Smart Transport Systems (ITSC ‘ 13), (IEEE, The Hague, The Netherlands, 2013)pp. 1693–1699 4. M. Rane, A. Patil, B. Barse, Real object detection using TensorFlow, in ICCCE 2019. Lecture Notes in Electrical Engineering, vol. 570, eds. by A. Kumar, S. Mozar (Springer, Singapore, 2020) 5. D. Pomerleau, RALPH: rapidly changing side role manager, in Proceedings of the Smart Vehicles, Symposium 95 (Detroit, MI, US, 2003), pp. 506–510 6. J. Annamalai, C. Lakshmikanthan, An optimized computer vision and image processing algorithm for unmarked road edge detection, in Soft Computing and Signal Processing (Springer, Singapore, 2019), pp. 429–437 7. H. Yuen, L. Mihaylova, H. Zhu, K.V. Yuen, L. Leung, Intelligent vehicle environmental vision. IEEE Intell. Trans. Syst. Trans. 18(10), 2584–2600 (2017)
Success of H1-B VISA Using ANN Priyadarshini Chatterjee, Muni Sekhar Velpuru, and T. Jagadeeswari
Abstract The proposed work predicts the outcome of H1-B VISAS that are applied by professionals belonging to different fields. This VISA is applied on temporary basis and to specialized workers only. People with B-Tech degree or equivalent can apply for the VISA if the requisite skills are required by the employers of USA H1-B VISA has a time limit of 3 years but can be extended up to 6 years. Being one of the most sought-after VISAs, its approval rate is pretty low. In the year 2019, out of 200,000 applicants only 85,000 applications for the VISA got approved. So, the rate of approval is only 42%. On yearly basis, due to increase in competition, the approval rate of the VISA is getting stringent. There are several factors on which the selection rate of the VISA depends. For predicting the success rate of VISA approval, we have considered a proposed system that works on the data set downloaded from Kaggle.com. The data set is then converted to numerical form using some encoding schemes. We also build an ANN model and use this data set to train the model. If the output is 0, then the VISA application is rejected, and if the output is 1, then the VISA application is accepted. This paper predicts the success of an individual in obtaining H1-B VISA. The proposed system in this paper obtains an accuracy of 94% in predicting the success of H1-B VISA. Keywords H1-B VISA · Artificial neural network · ReLU
P. Chatterjee (B) Department of Information Technology, Vardhaman College of Engineering, Hyderabad, India e-mail: [email protected] M. S. Velpuru · T. Jagadeeswari Department of Computer Science and Engineering, Vardhaman College of Engineering, Hyderabad, India e-mail: [email protected] T. Jagadeeswari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_48
491
492
P. Chatterjee et al.
1 Introduction The H1-B VISA is the most prestigious permission given to only small number of people. There are various factors on which the VISA is granted. The factors are scrutinized from the applications in text form like age, gender, caste, wages, etc. Out of large number of applicants only few are selected by the employer [1, 2]. This VISA is requested by highly skilled foreign nationals in the US embassy. If the applicant has one year of experience, it counts for one point. The user needs to get twelve points in order to qualify for the VISA. Even though a candidate gets 12 points, it is not assured that the applicant will get a fair chance for the VISA clearance. The chances of VISA acceptance are very thin. There are several restrictions imposed by the US Government [3, 4]. Dynamism in the rules and regulations of the US government makes it even more difficult to get the acceptance of the VISA. In this paper, we have used artificial neural network to predict the success of H1-B VISA. The model plays a crucial role in predicting the outcome of the VISA applications. Our proposed model takes into account all the factors stated above such as age, wages, and genders. We have selected artificial neural network as there are more than one layer that can be trained with appropriate data [5, 6]. We will be considering from Kaggle.com. We will be also considering the applications from the year 2013 to 2019 that consists of five million applications. The factors we consider are in text form. They will be converted to numeric form using an encoding scheme. There are various encoding schemes available of which we will be selecting one scheme. The scheme that will be selected should provide the maximum accuracy rate [6–8]. We intend to divide the five million data set into two parts. 70% of the data set will be used as an input for training the network and 30% will be considered as testing data. We will be cleaning the data oddity using bar histograms. We also intend to calculate the F1 score for testing the accuracy rate. We also plan to calculate the AUG and ROC values. This same model could have been created using support vector machine,Naïve Bayes, logistic regression, but artificial neural network proves to outperform all. The accuracy rate of an ANN is relatively more than the other domains [9–11].
2 Related Work There are different ways in which this work has been implemented. Mostly all have used data from Kaggle.com. Histograms were also used to find out the odd data that were eventually eliminated. During feature extraction, only the important features were considered and were labeled [12]. Different classification algorithms are used like RF, decision tree, and logit. In all the work, F score is calculated to get the final result. Another work on the same topic is done using machine learning algorithm. In machine learning algorithm, neural network has been used to do the prediction.
Success of H1-B VISA Using ANN
493
Neural network uses some important factors from the application to predict the outcome. These factors are in text forms and they are converted into numerical form with the help of any variable-length encoding algorithms. There are various variable-length encoding algorithms. Some paper has used one-hot encoding and binary encoding [13]. We intend to use sum encoding in our proposed work. The data set is normalized in all the works between 0 and 1. The final F1 score for logistic regression that we intend to achieve is 100%. As and on we progress, we will come to know how much we can achieve. What we may not achieve in this work we intend to fill the gap in our future work. There are papers in which the former machine learning technique is compared with newer machine learning techniques. In this work, the data sets extracted from OLFC contains forty attributes. Training and testing are also carried out in these data sets. Different results obtained using different machine learning algorithms are then compared. It is found that c5.0 has the highest accuracy rate [14, 15]. Another report that we came across has used K-means prediction algorithm to predict the algorithm of H1-B VISA. Decision tree was also used. This algorithm can work on small number of data set rather than on average one. Going through this report, we got good insight about the distribution of data set. Another report that we came across used random forest, AdaBoost, Naïve Bayes, and logistic regression to predict the outcome. During the preprocessing steps for feature extraction were implemented in a prudent manner, this preprocessing step was implemented in a way that one-hot k representation gave a very high success rate [16, 17]. There is also a study by Andrew Shakir that predicted the wages of the VISA applicants. This is performed by using text analysis of the attributes. The study concluded that job title and occupational classification are the two most important fields to predict the applicant’s wages as accurately as possible [18, 19]. There is another project done by the students of UC Berkley that predicts the waiting time to get the work VISA for a given job title and for a given employer. This study used K-nearest neighbor as a primary model [20, 21]. We end this section by giving a general diagram of a general ANN model (Fig. 1). The general methodology of ANN can be depicted using the following flowchart in Fig. 2.
3 Data Set In our proposed work, we have downloaded the data from Kaggle.com. Initially, we have considered a set of five million data. This data set is divided into parts. 70% of the data is used as an input to the layers and 30% of the data is used for testing the validity of the network. Relevant features in the data set that are in text form. They are converted to numerical values using some encoding schemes. Figure 3 is an example of the factors that are considered from the data set for our proposed work.
494
P. Chatterjee et al.
Fig. 1 Artificial neural network model
4 Proposed Work In our proposed work, we have considered 2013–2019 data to study the prediction outcome. The data set is downloaded from Kaggle.com that contains five million applications. The data set is divided into two parts containing two million and three million data. Histograms are used to segment the oddity. Sum encoding is used to convert the data into appropriate format. Finally, artificial neural network is used to train the data and to get the accurate prediction of the acceptance of the application. From the two sets of the data, the most important features are extracted like name, age, gender, wages, and caste. After truncating the data set and extracting the most relevant features, histograms are plotted to remove the deviation. Here, deviation means that some observations are at a particular distance that can give false predictions. We have used bar charts to remove the deviation. Then, we normalize the values 0 and 1. As we are dealing with data sets that are in text format, they have to convert to numeric form in order to be fed into the neural network. If there are values like “yes” or “no”, they are converted to 1 and 0 using sum encoding. After the preprocessing got completed, we got a result of 78,000 resultant data set. Of this 78,000 data set, 70% of the data trains the network and remaining 30% is used for testing. Next step, we build the model network with ANN algorithm. Algorithm is suitable for the proposed system as it contains layers. These layers are connected with each other and each layer has an activation function that converts input to output. Next step, we use multi-layer perceptron and rectified linear unit. The output layer is
Success of H1-B VISA Using ANN
495 INPUT DATA
ANN ALGORITHM
PARAMETER OPTIMISATION
NO
TRAINING PROCESS
NO
EVALUATION
OPTIMI SATION END
YES OPTIMI SATION END
YES Selection of training algorithm and respective parameters
Final Estimation of Test set
Fig. 2 Flowchart of generalized ANN
Fig. 3 Snapshot of factors considered from the data set
496
P. Chatterjee et al.
Fig. 4 General methodology of an ANN
called sigmoid that predicts accurately the acceptance of a VISA application. The final output is normalized between 0 and 1. The output is taken as if the value is 0 then the application is rejected, and if the value is 1, the application is accepted. Any ANN algorithm has these general steps depicted by Fig. 4. Needless to say, our proposed work also followed the same methodology. The confusion matrix of our proposed system can be drawn as follows: Predict
Actual
0
Accepted
Not Accepted
Accepted
TN
FP
Not Accepted
FN
TP
1 0
TN = true negatives, FN = false negatives. TP = true positives, FP = false positives.
1
Success of H1-B VISA Using ANN
497
5 Result Analysis Firstly, we apply the input to the network, and using the ANN algorithm, we get the output. Next, we feed the testing data set into the network to compare this result with our previous output of 70% data set to find out the rate of accuracy. Then, we determine the final result. Figure 5 shows the training accuracy achieved using the artificial neural network. Figure 6 is the loss of plot for the model that is calculated on 100 epochs. The final accuracy came out to be 94%. Then, we plot receiver operating characteristics and area under the graph to find the probability of the curve and also the degree of separability. Figure 7 gives the ROC and the AUG.
Epoch
Fig. 5 Snapshot of the graph of training accuracy
Epoch
Fig. 6 Snapshot of the graph of loss plot of model
498
P. Chatterjee et al.
Fig. 7 Snapshot of the graph of ROC curve
Finally, we speak of the confusion matrix. As per the confusion matrix, the mean recall score is 0.91 and F1 score is 0.94. We can also conclude the false positive rate is 18.
6 Conclusion and Future Work To conclude, we can predict the outcome of VISA application having certain attributes of the application. We can use ANN and machine learning algorithms to predict the outcomes successfully. Artificial neural network is the best model as it gave us a success rate of 94%. We have also received the final F1 score as 0.94. This F1 score was calculated on a balanced test data. Artificial neural networks are always better at describing data complexities. In future, we intend to develop a much coherent model that can give a better accuracy rate. We intend to achieve an accuracy rate of 100% in future. For that we may need to consider updated data set. We may also need to change encoding schemes and also introduce a much better approach for training the layers of the ANN. This paper may be considered as a basis for future reference for advanced research on the prediction of the outcome of H1-B VISA.
References 1. S. Karun, Predicting the outcome of H-1B VISA eligibility, in Advances in Computer Communication and Computational Sciences. (Springer, Singapore, 2019), pp. 355–364 2. A. Dombe, R. Rewale, D. Swain, A deep learning-based approach for predicting the outcome of H-1B VISA application, in Machine Learning and Information Processing (Springer, Singapore, 2020), pp. 193–202 3. P. Thakur, M. Singh, H. Singh, P.S. Rana, An allotment of H1B work VISA in USA using machine learning. Int. J. Eng. Technol. 7(2.27), 93–103 (2018) 4. B. Gunel, O.C. Mutlu, Predicting the outcome of H-1B VISA applications
Success of H1-B VISA Using ANN
499
5. D. Swain, K. Chakraborty, A. Dombe, A. Ashture, N. Valakunde, Prediction of H1B VISA using machine learning Al- gorithms, in 2018 International Conference on Advanced Computation and Telecommunication (ICACAT). (IEEE, 2018), pp. 1–7 6. D. Sundararaman, N. Pal, A.K. Misraa, An analysis of nonimmigrant work VISAs in the USA using machine learning. Int. J. Comput. Sci. Secur. (IJCSS) 6 (2017) 7. J. Lin. H-1B VISA Data Analysis and Prediction by using K-means Clustering and Decision Tree Al- gorithms. (Online) Available: https://github.com/Jinglin-LI/H1B-VISA-Predictionby-Machine-Learning-Algorithm/blob/master/H1B%20Prediction%20Research%20Report. pdf 8. K. Doran, A. Gelber, A. Isen, The effects of high-skilled immigration policy on firms: evidence from H-1B VISA lotteries (No. w20668). National Bureau of Economic Research (2014) 9. G. Peri, K. Shih, C. Sparber, STEM workers, H-1B VISAs, and productivity in US cities. J. Labor Econ. 33(S1), S225–S255 (2015) 10. H-1B Fiscal Year (FY) 2018 Cap season. USCIS. (Online). Available: https://www.uscis.gov/ working-united-states/temporary-workers/h-1b-specialty-occupations-and-fashion-models/h1b-fiscal-year-fy-2018-cap-season. Accessed 20 Oct 2017 11. High-skilled VISA applications hit record high. CNNMoney. (Online). Available: https:// money.cnn.com/2016/04/12/technology/h1b-cap-VISA-fy-2017/index.html. Accessed 20 Oct 2017 12. Using Text Analysis To Predict H-1B Wages, The Ocial Blog of BigML.com, 01 Oct 2013. (Online). Available: https://blog.bigml.com/2013/10/01/using-text-analysis-to-predicth1-b-wages/. Accessed: 20 Oct 2017 13. Predicting Case Status of H-1B VISA Petitions. (Online). Available: https://cseweb.ucsd.edu/ classes/wi17/cse258-a/reports/a054.pdf 14. H-1B VISA Data Analysis and Prediction by using K-means Clustering and DecisionTree Algorithms. (Online). Available:https://github.com/Jinglin-LI/H1B-VISA-Prediction-by-Mac hine-Learning-Algorithm/blob/master/H1B%20Prediction%20Research%20Report.pdf 15. H-1B VISA Petitions 2011–2016 | Kaggle. (Online). Available: https://www.kaggle.com/nsh aran/h-1b-VISA/data. Accessed: 20 Oct 2017 16. A. Ng, CS229 Lecture Notes (Online). Available: https://cs229.stanford.edu/notes/. Accessed 14 Dec 2017 17. W. McKinney, Data structures for statistical computing in python, in Proceedings of the 9th Python in Science Conference (2010), pp. 51–56 18. A. Kumar, Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. Int. J. Cloud Appl. Comput. (IJCAC) 9(3), 22–36 (2019) 19. A. Kumar, S. Srivastava, Object detection system based on convolution neural networks using single shot multi-box detector. Procedia Comput. Sci. 171, 2610–2617 (2020) 20. A. Kumar, S.S.S.S. Reddy, V. Kulkarni, An object detection technique for blind people in realtime using deep neural network, in 2019 Fifth International Conference on Image Information Processing (ICIIP), Shimla, India (2019), pp. 292–297. doi: https://doi.org/10.1109/ICIIP4 7207.2019.8985965 21. A. Kumar, A review on implementation of digital image watermarking techniques using LSB and DWT, in 3rd International Conference on Information and Communication Technology for Sustainable Development (ICT4SD 2018), held during 30–31 Aug 2018 at Hotel Vivanta by Taj, Goa, India
ETL and Business Analytics Correlation Mapping with Software Engineering Bijay Ku Paikaray, Mahesh R. Dube, and Debabrata Swain
Abstract Large information approach can’t be effectively accomplished utilizing customary information investigation strategies. Rather, unstructured information requires specific information demonstrating methods, apparatuses, and frameworks to separate experiences and data varying by associations. Information science is a logical methodology that applies scientific and measurable thoughts and PC instruments for preparing large information. At present, we all are seeing an exceptional development of data created worldwide and on the web to bring about the idea of large information. Information science is a significant testing zone because of the complexities engaged with consolidating and applying various strategies, calculations, and complex programming procedures to perform insightful investigation in huge volumes of information. Thus, the field of information science has developed from enormous information, or huge information and information science are indistinguishable. In this article we have tried to create bridge between ETL and software engineering. Keywords Data science · ETL process · Software engineering · Business analysis · SDLC
1 Introduction Information Science and Software Engineering both include programming abilities. The thing that matters is that Data Science is more worried about social affair and B. K. Paikaray (B) Department of CSE, Centurion University of Technology and Management, Bhubaneswar, Odisha, India e-mail: [email protected] M. R. Dube Department of CSE, Vishwakarma Institute of Technology, Pune, India D. Swain Department of Data Science Department, Christ University, Lavasa, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_49
501
502
B. K. Paikaray et al.
dissecting information, while Software Engineering centers more on creating applications, highlights, and usefulness for end-clients. At the point when you initially start your exploration in the tech field, you’re probably going to run over a huge swath of various strengths and vocations. Two of the most well-known choices are information science and programming designing. Both of these fields accompany their own arrangements of upsides and downsides, and between the two are various likenesses and contrasts. In spite of the fact that they’re the two fragments of the technical business, they’re certainly two altogether different ways to go down. In the event that you need to conclude which is the better fit for your abilities and interests, this Career Karma article should assist with clearing up any disarray [1, 2]. Information science is famously difficult to characterize precisely, however you could consider it “the utilization of calculations and measurements to draw bits of knowledge from organized and unstructured information". The objective of an information researcher will depend a considerable amount on the difficult they’re analyzing. With regards to business, an information researcher may be estimating the effect of changes in limited time material; in account, an information researcher is most likely attempting to find imagine a scenario in which (anything) precisely predicts returns in one of the significant markets. The more talented information researchers may likewise be entrusted with making new calculations and systems for handling informational indexes. It is an important, developing field that offers a lot of chances to those with the correct abilities and experience [3, 4]. Programming building is another of the significant divisions of the tech business. Programming designing has a genuinely well-suited and simple title: it includes utilizing programming and building aptitudes to grow new programming. In programming advancement, the objective is to make new projects, applications, frameworks, and even computer games. While both are exceptionally specialized fields, and keeping in mind that both have comparable ranges of abilities, there are huge contrasts in the manners by which these aptitudes are normally applied. Along these lines, how about we contrast information science versus programming building with locate the most significant contrasts. Generally, an information researcher utilizes their aptitudes to filter through information, decipher it in significant ways, discover examples, and use what they’ve figured out how to enable a business to settle on a choice or figure out how to be more proficient. To put it another way, information science will in general be significantly more about investigation by and by, with certain parts of programming and improvement tossed in. Programming designing, then again, will in general spotlight on making frameworks and programming that is easy to use and that fills a particular need. It’s normal for there to be a substantial investigative part to this procedure, so it’s anything but difficult to perceive how the two fields cover. There is an assortment of spots at which an individual could come into the information science pipeline. In the event that they’re gathering information, they’re presumably called an ‘information designer’, and they will pull information from different sources, cleaning and preparing it, and putting away it in a database. This is generally alluded to as the Extract, Transform, and Load (ETL) process. On the off chance that they’re utilizing this information to fabricate models and do investigation, they’re
ETL and Business Analytics Correlation Mapping …
503
most likely called an ‘information expert’ or ‘AI engineer’. The significant parts of this piece of the information science pipeline are ensuring that any models manufactured aren’t abusing their hidden presumptions and that they’re really driving beneficial bits of knowledge [5, 6]. Another large contrast between information science versus programming building is the methodology they will in general use as activities develop. Information science is a very procedure situated field. Its specialists ingest and break down informational collections so as to more likely comprehend an issue and show up at an answer. Programming designing, then again, is bound to move toward assignments with existing systems and strategies. The Waterfall model, for example, is a famous strategy that keeps up that each period of the product advancement life cycle must be finished and evaluated before proceeding onward to the following.
2 Related Work Information researchers and programming engineers utilize a wide assortment of exactness hardware to carry out their responsibilities adequately and proficiently. We should investigate an example. An information researcher’s wheelhouse contains apparatuses for information investigation, information perception, working with databases, AI, and prescient demonstrating. Which of these they end up utilizing will rely upon their job? In the event that they’re doing a great deal of information ingestion and capacity they’ll most likely be utilizing Amazon S3, MongoDB, Hadoop, MySQL, PostgreSQL, or something comparable. Figure 1 explains about the Amazon S3. For model structure, there’s a decent possibility they’ll be working with Scikit-learn. Conveyed preparing of enormous information requires Apache Spark [3, 7]. A product engineer uses instruments for programming plan and examination, programming testing, programming dialects, web application devices, and substantially more. Similarly as with information science, a ton relies upon what you’re attempting to achieve. For really creating code Atom, TextWrangler, Visual Code Studio, Emacs, and Vim are for the most part well known. In the realm of backend web improvement Ruby on Rails, Python’s Django, and Flask see loads of utilization. Vue.js has developed as of late as perhaps the most ideal methods of building lightweight web applications, and the equivalent could be said for AJAX when building dynamic, non-concurrently refreshing site content (Fig. 2). Fig. 1 Utilization of Amazon S3 [7]
504
B. K. Paikaray et al.
Fig. 2 Verification process flow of Python’s Django [8]
Software engineers are a more conventional term for any individual who creates programming. It frequently covers and is utilized conversely with programming building. The key distinction is that product engineers apply building standards explicitly. Since programming building and programming advancement cover so vigorously, the solid dominant part of the distinctions from information science above apply to both programming designing and programming improvement [5, 8, 4]. Information engineers are in reality nearer to programming engineers than information researchers are. Information engineers vary from information researchers in that architecture center around how information is dealt with, while researchers center on the aftereffect of that information. Information designs fundamentally take a shot at the product that accumulates and handles information that information researchers frequently use. Information engineers create programming simply like programming engineers, just that product is exclusively centered on information. The relationship is like game designers, who additionally create programming like specialists yet just spotlight on computer games. Information researchers utilize the ETL procedure, while programming engineers utilize the SDLC procedure. Figure 3 describes about the Researchers utilize the ETL procedure. Information science is more procedure arranged, though programming building utilizes structures like Waterfall, Agile, and Spiral [5, 9]. The ranges of abilities of programming specialists and information researchers are joining, at any rate with regards to item confronting information science applications, such as building recommender frameworks. Information researchers are being solicited to take care from organization and product realization, and programming engineers are being approached to grow their range of abilities to incorporate demonstrating.
ETL and Business Analytics Correlation Mapping …
505
Fig. 3 Researchers utilize the ETL procedure [9]
3 ETL Process The ETL procedure is a 3-advance procedure that begins with extricating the information from different information sources and afterward crude information experiences different changes to make it reasonable for putting away in information distribution center and burden it in information stockrooms in the necessary configuration and prepare it for examination [9].
3.1 ETL Process Stage-1 a. Extract: This progression alludes to getting the necessary information from different sources that are available in various organizations, for example, XML, Hadoop documents, Flat Files, JSON, and so forth. The extricated information is put away in the organizing zone where further changes are being performed. Along these lines, information is checked altogether before moving it to information distribution centers else it will end up being a test to return the adjustments in information stockrooms. An appropriate information map is required among source and focuses before information extraction happens as the ETL procedure needs to associate with different frameworks, for example, Oracle, Hardware, Mainframe, constant frameworks, for example, ATM, Hadoop, and so forth while getting information from these frameworks [10]. b. Full Extraction: This is followed when entire information from sources get stacked into the information distribution centers that show either information
506
B. K. Paikaray et al.
stockroom is being populated the first run through or no procedure has been made for information extraction. c. Fractional Extraction (with update notice): This technique is likewise known delta, where just the information being changed is removed and update information stockrooms. d. Halfway Extraction (without update warning): This system alludes to extricate explicit required information from sources as per load in the information distribution centers as opposed to removing entire information.
3.2 ETL Process Stage-2 a. Transform: This progression is the most significant advance of ETL. In this progression numerous changes are performed to prepare information for load in information distribution centers by applying beneath changes [11]. b. Fundamental Transformations: These changes are applied in each situation as they are essential need while stacking the information that has been extricated from different sources, in the information distribution centers. c. Information Cleansing or Enrichment: It alludes to cleaning the undesired information from the organizing zone with the goal that off-base information doesn’t get stacked from the information distribution centers. d. Separating: Here we sift through the necessary information out of a lot of information present as indicated by business prerequisites. For instance, for creating business numbers one just needs deals records for that particular year. e. Union: Data separated are merged in the necessary arrangement before stacking it into the information warehouses. f. Normalizations: Data fields are changed to acquire it the equivalent required organization for e.g., the information field must be determined as MM/DD/YYYY. g. Propelled Transformations: These kinds of changes are explicit to the business necessities [12, 13]. h. Joining: In this activity, information from at least 2 sources are consolidated to create information with just wanted sections with columns that are identified with one another i. Information Edge Approval: Values present in different fields are checked on the off chance that they are right or not, for example, not invalid financial balance number in the event of bank information. j. Queries to Consolidate Information: Various level records or different documents are utilized to extricate the particular data by performing query procedure on that. k. Perplexing Information Approval: Many complex approvals are applied to separate legitimate information just from the source frameworks. l. Determined and Derived Qualities: Various figuring’s are applied to change the information into some necessary data.
ETL and Business Analytics Correlation Mapping …
507
m. Duplication: Duplicate information originating from the source frameworks are broke down and expelled before stacking it in the information stockrooms. n. Key Restructuring: For the situation of catching gradually evolving information, different proxy keys should be produced to structure the information in the necessary arrangement.
3.3 ETL Process Stage-3 a. Load: This progression alludes to stacking the changed information into the information distribution center from where it very well may be utilized to produce numerous investigation choices just as detailing[14, 15]. b. Beginning Load: This sort of burden happens while stacking information in information distribution centers just because. c. Gradual Load: This is the sort of burden that is done to refresh the information distribution center on an intermittent premise with changes happening in source framework information. d. Full Refresh: This sort of burden alludes to the circumstance when complete information of the table is erased and stacked with new information. There are numerous ETL instruments accessible in the market. Be that as it may, it is hard to pick the suitable one for your undertaking. Some ETL instruments are portrayed underneath: e. Hevo: It is an effective Cloud Data Integration Platform that brings information from various sources, for example, Cloud stockpiling, SaaS, Databases to the information stockroom progressively. It can deal with enormous information and supports both ETL and ELT. f. QuerySurge: It is a trying arrangement used to robotize the testing of Big Data and Data Warehouses. It improves information quality and quickens information conveyance cycles. It underpins testing on various stages, for example, Amazon, Cloudera, IBM, and some more. g. Prophet: Oracle information distribution center is an assortment of information and this database is utilized to store and recover information or data. It encourages different clients to get similar information effectively. It underpins virtualization and permits interfacing with far off databases too. h. Array: It is an information stockroom that mechanizes information assortment, information change, and information stockpiling. It can associate with any apparatus like Looker, Chartio, and so on. i. MarkLogic: It is an information warehousing arrangement that utilizes a variety of highlights to make information reconciliation simpler and quicker. It determines complex security rules for components in the records. It assists with bringing in and trade the setup data. It additionally permits information replication for debacle recuperation.
508
B. K. Paikaray et al.
j. Amazon RedShift: It is an information stockroom apparatus. It is financially savvy, simple, and easy to utilize. There is no establishment cost and improves the unwavering quality of the information distribution center group. Its server farms are completely outfitted with atmosphere control. k. Teradata Corporation: It is the main Massively Parallel Processing economically accessible information warehousing apparatus. It can deal with a lot of information effectively and proficiently. It is likewise basic and financially savvy as Amazon Redshift. It totally deals with equal design (Fig. 4).
4 Software Engineering Dimensions Over last fifty years, software development witnessed multiple theories and experimentations in order to gain control over several types of complexities and desirable quality. The evolving nature of the software was as important as process and methodologies to truly accommodate the practices, principles, and laws suitable for software development arena. Software behavior was considered to be the basis for generations of automated tools supporting variety of activities termed as phases and workflows. The software engineering disciple is still a developing discipline even though it has been included as a part of formal training given to software development organizations. Software consists of set of artifacts termed as components or classes or objects or modules or simply units depending on the type of implementation analogy and paradigm used. The significant concern for the practitioners and method experts was software scaling. Scaling can be of small, medium, or large size based on the domain from which the concepts are coming from and typically related with maintenance cycle taking care of current status of software [1]. There are multiple knowledge segments each emphasizing on specific technique or method usage factors. Theoretical frameworks are having their own beliefs and limited scope of application to the experimental setups. Industry practices and standards are of bespoke type and a lot of customization is required causing instability for the approaches for software development. In order to reduce the impedance mismatch between the concepts and realization, it is necessary to fill the analysis- design gaps. Interoperability, interchangeability, portability, reusability, and maintainability are the additional forces impacting software development. This means that multiple complications are involved in the software engineering discipline due to cognitive domain mapping to analysis-design domain and to make it trans disciplinary both theoretical and empirical investigations. Due to the interrelationships and dependencies of software development cycles, the organizational processes undergo revisions. The theoretical aspects include abstract concepts represented through mathematics-based knowledge leading to formal inferences whereas the empirical aspects include concrete concepts through data-based deductive knowledge leading to experimental validation. The difference between the application cycle, domain engineering cycle, software lifecycle, and process cycles are vaguely coined [2].
ETL and Business Analytics Correlation Mapping …
Fig. 4 Classification of ETL process
509
510
B. K. Paikaray et al.
For computing applications with repetitions, software serves as an intelligent artifact enabling the software to take the form as mathematical entity and concrete product having behavioral perspectives demonstration. The software metaphor can be seen as intermix of computing hardware and programs with logic. Natural languages are rich in context, descriptive and ambiguous in representation of phenomena or concepts. The mathematical form represents logic as ‘to be context’ whereas the set theory describes relationships as ‘to have context’. The process algebras involving actions and events dynamics describes ‘to do context’. Software expressed in the form of structural arrangements of elements called architectures and behavioral perspectives are required to have both abstract and precise specification. Abstraction, digital representation, and meaningfulness are the three essential properties of information. Software has to convey such information to variety of stakeholders involved in software development cycle. This forms a conceptual relationship between the expected behavioral specifications and applications running on platforms. Software engineering myths are formed on the basis that there is no link between mathematics and programming, no engineering segment is involved in software engineering due to absence of scientific laws, programming can be done by anyone who knows a programming language, and a working prototype is sufficient to express the real artifact as an outcome of software development. Software does not have physical dimension and hence considered to be flexible. Program is widely accepted as an artifact created by application of human intelligence in the information society and software development industry. Programs are written using programming languages and algorithms carrying logic acting of particular piece of data or information. The set of programs are required to be integrated with configuration and documentation support leading to specifications related to analysis, design, test, and maintenance phases of software development lifecycle. The specifications are required to be reviewed and rationales can be derived by applying justification [3].
5 Conclusion For computing applications with repetitions, software serves as an intelligent artifact enabling the software to take the form as mathematical entity and concrete product having behavioral perspectives demonstration. The software metaphor can be seen as intermix of computing hardware and programs with logic. Natural languages are rich in context, descriptive and ambiguous in representation of phenomena or concepts. The mathematical form represents logic as ‘to be context’ whereas the set theory describes relationships as ‘to have context’. The process algebras involving actions and events dynamics describes ‘to do context’.
ETL and Business Analytics Correlation Mapping …
511
References 1. C.B.B.D. Manyika, Big Data: The Next Frontier for Innovation, Competition, and Productivity (McKinsey Global Institute, 2011) 2. J. Gantz, D. Reinsel, The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC (2013) 3. T.H. Davenport, D.J. Patil, Data scientist: the sexiest job of the 21st century. Harvard Bus. Rev. (2012) 4. J. Manyika, M. Chiu, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, A.H. Byers, Big Data: The Next Frontier for Innovation, Competition, and Productivity (McKinsey Global Institute, 2011) 5. D.W. Hubbard, How to Measure Anything: Finding the Value of Intangibles in Business (Wiley, Hoboken, NJ, 2010 6. J. Cohen, B. Dolan, M. Dunlap, J.M. Hellerstein, C. Welton, MAD skills: new analysis practices for big data, Watertown, and MA (2009) 7. https://medium.com/radon-dev/redirection-on-cloudfront-with-lambda-edge-e72fd633603e 8. https://developer.mozilla.org/en-US/docs/Learn/Server-side/Django/Introduction. 9. https://en.wikipedia.org/wiki/Extract,_transform,_load 10. D. Wilson, Parallels and differences in the treatment of metaphor in relevance theory and cognitive linguistics. Intercultural Pragmatics 8, 177–196 (2011) 11. N. Riemer, Word Meaning, The Oxford Handbook of the Word (Oxford University Press, Oxford, 2015). 12. N. Chomsky, New Horizons in the Study of Language and Mind (Cambridge University Press, Cambridge, 2000). 13. L. Cysneiros, J. Leite, Nonfunctional requirements: from elicitation to conceptual models. IEEE Trans. Software Eng. 30(5), 328–350 (2004) 14. A. Gregoriades, A. Sutcliffe, Scenario-based assessment of nonfunctional requirements. IEEE Trans. Softw. Eng. 31(5), 392–409 (2005) 15. L.J. Shan, H. Zhu, Unifying the semantics of models and meta-models in the multi-layered UML meta-modelling hierarchy. Int. J. Softw. Inf. 6(2), 163–200 (2012)
A Novel Multilevel RDH Approach for Medical Image Authentication Jayanta Mondal and Madhusmita Das
Abstract Online healthcare is the next big thing and proper security mechanisms with privacy preservation techniques for sensitive data is the need of the hour. Handling sensitive datasets such as medical data needs ultimate security which covers confidentiality, integrity, authentication, and reversibility. This paper presents a novel approach for authentication using reversible data hiding (RDH) technique. Traditional RDH methods provide adequate security for sensitive image. The proposed RDH technique uses a combination of reversible data marking techniques in multiple levels to provide a robust authentication measure for medical images. Least significant bit (LSB) modification works as the base methodology for data marking to ensure complete reversibility. Keywords Reversible data hiding · Data marking · Least significant bit · Encryption · Sensitive data
1 Introduction Reversible data hiding has been the front runner among all the techniques in providing security and privacy to sensitive images with maximum reversibility. The traditional encryption algorithms are not fruitful for medical imagery as recovered image quality remains the most important concern. Almost all robust cryptographic algorithms use strong compression which degrades the quality of the original image. This degradation is acceptable in normal images as the amount of redundancy is very high in normal images, which is not the case for medical images. RDH is proposed by Barton in 1997 [1]. In the past decade, RDH methods have developed very quickly and covers a wide range of services, which include data hiding in uncompressed domain, J. Mondal (B) School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, India e-mail: [email protected] M. Das Central University of Odisha, Koraput, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_50
513
514
J. Mondal and M. Das
data hiding in compressed domain, contrast enhancement, and most importantly data hiding in encrypted domain.
2 Types of RDH Based on properties and types of images the RDH techniques and their utilization varies. In terms of services RDH schemes in general provides authentication through data embedding and privacy preservation through data hiding. In some cases, RDH is extensively used for contrast enhancement. Recently, RDH schemes are mostly used in the encrypted domain and adds confidentiality to the original image. In this section, a brief description is given on types of RDH methods that are operational. RDH methods can be broadly classified in four domains: (i) RDH in uncompressed image, (ii) RDH in compressed image, and (iii) RDH in encrypted image.
2.1 RDH in Uncompressed Image There are various applications of RDH in uncompressed image [2, 3]. In uncompressed images, there are large amount of free space in the image that can be used for hiding information. In spite of that, this method has not gained popularity, since, the embedding capacity is low. High embedding may lead to degradation in the quality in the received image. Moreover, this method lacks confidentiality which is highly desirable in sensitive images.
2.2 RDH in Compressed Image Various works have been carried out till date where RDH has been used in compressed images [4, 5]. The embedding capacity can be enhanced in compressed images. But the main problem with compressed image is that, compression of images always leads to data loss. For sensitive images, like, medical images, military images even very small data loss is not desirable. So RDH in compressed image cannot be applied for sensitive images.
2.3 RDH in Encrypted Image In 2011, Xinpeng Zhang first proposed a RDH technique for encrypted images [6]. RDH becomes a complete cryptographic process after it is applied into the encrypted images. Especially, for sensitive images RDH becomes the front runner.
A Novel Multilevel RDH Approach for Medical Image Authentication
515
RDH methods takes full advantage of the property of sensitive imagery, i.e., minimum redundancy. The reversible processes that RDH uses in the data embedding phase, works as a secondary security mechanisms for sensitive images. This phenomenon supports use of simple, reversible, and low weight encryption technique at the beginning. Various works have been proposed based on RDH in encrypted images [7–11].
3 Methodologies of RDH 3.1 Difference Expansion In Difference expansion [2, 3], gap between two pixels is further enlarged to create space for hiding data. This increase in difference between two pixels anyway. As a result, the magnitude of the pixel leads to major degradation in the quality of the image. So, in general, this method is incompetent in sensitive images.
3.2 Histogram Shift In histogram shifting [12, 13], the data hiding is carried out through modification of histogram peak-values. In this methodology, firstly the lowest and the highest altitudes are determined. Lowest point refers to the area with minimum pixels and highest point indicates the area with maximum pixels. The amount of additional data embedding capacity can be increased by using the altitude difference. These peak points are used for additional data embedding. The most important disadvantage of histogram shift process is to restore the image in its entirety, embedding of minimum and maximum points is a necessary condition along with rest of the image.
3.3 LSB Modification LSB modification uses the difference in the degree of intensity. The least significant bits carries minimum information as compared to the most significant bits. In this methodology, the LSBs are modified in different reversible and irreversible manner to carry out different types of activities, such as data hiding for authentication, data hiding for additional bit hiding, and so on [10, 11].
516
J. Mondal and M. Das
4 Literature Review 4.1 Zhang (2011) [6] Zhang first proposed an RDH technique that is being implemented in the encrypted domain. This work shows the idea that through RDH methods along with authentication, confidentiality can also be achieved. Besides, if used on sensitive imagery, the data embedding part can provide additional security. A 3- Actor architecture is proposed. Among the three actors, the first one is the sender, who sends the original data to the second actor, namely the data hider. Before sending the original image it is encrypted using reversible XOR operation at the senders’ site. The data hider does additional data hiding in a reversible way through different type of LSB modification. Finally, the marked encrypted image arrives to the receiver from the data hider. The receiver needs to have both the encryption key as well as the data hiding key for additional data extraction and image recovery.
4.2 Hong et al. [7] Hong et al. improved Zhang’s method [6] in terms of the extraction process of the additionally embedded bits. The encryption and data hiding procedures are similar to the previous method [6]. At the receivers’ site, two additional techniques namely, side matching and smoothness evaluation are used to minimize the error rate. The border area of the blocks are taken into account and the certain blocks are selected where data recovery is compromised.
4.3 Qin and Zhang [8] In 2015, Qin and Zhang proposed an improved RDH method that uses an adaptive judging function to choose the pixels or more specifically blocks for data embedding. In [6], half of the pixels were altered for additional data hiding. The visual quality of the recovered image is highly improved as lesser number of pixels are altered.
4.4 J Mondal et al. [9] Mondal et al. [9] proposed an improved reversible RDH technique that enhanced the recovered quality of the image. In the original image encryption part, a random matlab generated 512 × 512 bit key is used for encryption. In the data embedding part, the image is separated into same sized blocks and LSB modification is used for
A Novel Multilevel RDH Approach for Medical Image Authentication
517
providing authentication. In the receivers site, the encryption/decryption key and the embedding is necessary to recover the image with its quality intact.
5 Proposed Method The proposed method is specifically designed to improve the data marking part of the RDH system. The proposed method uses LSB modification technique as its core methodology. It is implemented in encrypted domain and has three main subprocesses namely encryption, data marking, and recovery. In the data marking phase a two level LSB modification technique is implemented. At one level data marking is conducted through XOR operation, left rotation, and bit flipping. In the second level LSB swapping takes place. The architecture involves three actors: the content owner, the data marker, and the receiver. Figure 1 shows the proposed architecture. Encryption Algorithm Step 1: The plaintext image I of size M × N is converted to grey-scale by I (i, j) = Igrey(i, j) × (a + b)
(1)
where, 2a = M and 2b = N and I grey (i, j) = greyscale weight generated from the RGB scale. Step 2: The encrypted image is generated using the XOR operation with the private key. I E(i, j) = I (i, j) ⊕ K (i, j)
Fig. 1 Proposed RDH architecture
(2)
518
J. Mondal and M. Das
where, K(i, j) is the key of M × N size. Data Marking Algorithm Step 1: Divide IE into n number (1 to n) of same sized blocks of order S × S. 1st Level: For all alternatives block starting 1 to n − 1, i.e., odd numbered blocks. Step 2: The first row of pixels kept unchanged. Step 3: XOR between the three LSB bits of the first row and the second row. Step 4: If, result is zero, then no modification done. Else, flip the fourth LSB. Step 5: Continue the same XOR operation with the first unchanged pixel with all remaining pixels. Step 6: Continue Step 4 and Step 5 for all the remaining odd blocks. 2nd Level: For all alternatives block starting 2 to n, i.e., for all even numbered blocks. Step 7: Swap the last three LSBs of every pixel with the next even block, i.e., block no. 2 with block no. 4, block no. 6 with block no. 8 and so on. Step 8: Continue Step 7 until all even numbered blocks are covered. Step 9: Finally, generate the marked image I M by combining all odd and even numbered blocks. Recovery Algorithm Step 1: Divide I E into n number (1 to n) of non-overlapping image blocks of order S × S. Step 2: Re-perform the data marking process to generate I E . Step 3: Original image is generated using the private key. I = IE ⊕ K
(3)
6 Experimental Result Analysis To prove the worth of the proposed scheme, experiments were conducted on two 512 × 512 sized images, i.e., a standard test image: Lena, and a medical test image: a CT scan image. In Figs. 2 and 3 different stages during implementation is shown. In Tables 1 and 2 the PSNR and SSIM comparisons between [6–9], and proposed method is shown in terms of directly decrypted image and recovered image. The above table clearly depicts the efficiency of the proposed method. The directly decrypted image’s PSNR and SSIM value clearly shows the image quality is degraded and the quality improves completely when it goes through unmarking process.
A Novel Multilevel RDH Approach for Medical Image Authentication
519
Fig. 2 Lena image, a original image b encrypted image c marked encrypted image d directly decrypted image, e recovered image
Fig. 3 CT scan image, a original image b encrypted image c marked encrypted image d directly decrypted image, e recovered image Table 1 Comparison table of PSNR values of directly decrypted CT scan image and recovered CT scan image when block size is 4 × 4
Table 2 Comparison table of SSIM values of directly decrypted CT scan image and recovered CT scan image when block size is 4 × 4
Papers
Directly decrypted image
Recovered image
[6]
36.94
44.86
[7]
36.94
44.86
[8]
38.59
45.57
[9]
38.98
61.78
Proposed method
36.37
∞ (Infinity)
Papers
Directly decrypted image
Recovered image
[6]
0.96303
0.99632
[7]
0.96303
0.99713
[8]
0.97653
0.99825
[9]
0.98688
0.99899
Proposed method
0.96298
1
520
J. Mondal and M. Das
7 Conclusion In this paper, a novel two level data-marking process is introduced for sensitive images. The proposed method provides a robust authentication mechanism, which not only helps secure integrity but also provide adequate security. The process goes through an early encryption which makes the content unintelligible. The experimental results clearly shows, the data marking adds an extra layer of security which is pretty high considering the nature of the image. The method has been tested with medical images, compared with preexisting methods and the results are very satisfactory.
References 1. J.M. Barton, U.S. Patent No. 5,646,997. Washington, DC: U.S. Patent and Trademark Office, 1997 2. J. Tian, Wavelet-based reversible watermarking for authentication, in Security and Watermarking of Multimedia Contents IV, vol. 4675. (International Society for Optics and Photonics, 2002), pp. 679–691 3. J. Tian, Reversible data embedding using a difference expansion. IEEE Trans. Circuits Syst. Video Technol. 13(8), 890–896 (2003) 4. J. Fridrich, M. Goljan, R. Du, Invertible authentication watermark for jpeg images, in Proceedings of International Conference on Information Technology: Coding and Computing, 2001 (IEEE, 2001), pp. 223–227 5. G. Xuan, Y. Q. Shi, Z. Ni, P. Chai, X. Cui, X. Tong, Reversible data hiding for jpeg images based on histogram pairs, in International Conference Image Analysis and Recognition (Springer, 2007), pp. 715–727 6. X. Zhang, Reversible data hiding in encrypted image. IEEE Signal Process. Lett. 18(4), 255–258 (2011) 7. W. Hong, T.S. Chen, H.Y. Wu, An improved reversible data hiding in encrypted image using side match. IEEE Sign. Process. Lett. 19(5), 199–203 (2012) 8. C. Qin, X. Zhang, Effective reversible data hiding in encrypted image with privacy protection for image content. J. Visual Commun. Image Representat. 31, 154–164 (2015) 9. J. Mondal, D. Swain, D.P. Singh, S. Mohanty, An improved lsb-based RDH technique with better reversibility. Int. J. Electron. Secur. Digit. Forensics 9(3), 254–268 (2017) 10. J. Mondal, D. Swain, and D.D. Panda, An improved RDH model for medical images with a novel EPR embedding technique, in International Conference on Advances in Computing and Data Sciences. (Springer, Singapore, 2018) 11. J. Mondal, D. Swain, D.D. Panda, A novel LSB-based RDH with dual embedding for encrypted images. Int. J. Electron. Secur. Digit. Forensics 11(3), 281–293 (2019) 12. M. Fallahpour, High capacity lossless data hiding based on histogram modification. IEICE Electr. Express 4(7), 205–210 (2007) 13. G. Xuan, Y. Q. Shi, P. Chai, X. Cui, Z. Ni, X. Tong, Optimum histogram pair based image lossless data embedding, in International Workshop on Digital Watermarking (Springer, 2007), pp. 264–278
Copy-Move Forgery Detection Using Scale Invariant Feature Transform Bandita Das, Debabala Swain, Bunil Kumar Balabantaray, Raimoni Hansda, and Vishal Shukla
Abstract In the era of digital technology, digital images play an important role in the day to day applications starting from medical diagnosis to court of laws as evidence for the crime investigation. Nowadays it becomes easy to tamper the image with lowcost software tools and hence, only by visual perception one cannot guarantee the authenticity and the integrity of its originality. Copy-move image forgery (CMIF) is one such tampering technique where a region of the image itself is copied and pasted into another region in the same image. This is done either to hide important information or to pass irrelevant data to others. This image forgery (IF) seems to be realistic and hard to detect as the forged region encompasses similar structural characteristics as the original one. This report represents a key-point based method to detect CMIF using Scale Invariant Feature Transform (SIFT). The experimental results and analysis of this method are discussed in detail on online available datasets. This method can detect the CMIF, even the image is distorted by intermediate attacks such as scale, rotation, and some post-processing attacks like image blurring, noise adding, contrast adjustment, color reduction, brightness changes, JPEG compression, etc. This method is also able to detect multiple CMIF. Keywords Image forgery · Copy-Move image forgery · SIFT
B. Das · D. Swain (B) Department of Computer Science, Rama Devi Women’s University, Bhubaneswar, India e-mail: [email protected] B. K. Balabantaray · R. Hansda Department of Computer Science and Engineering, National Institute of Technology, Meghalaya, India V. Shukla Aviz Networks Inc, San Jose, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_51
521
522
B. Das et al.
1 Introduction Nowadays, the digital image is a powerful media of communication and the conveyer of information. It speaks more than words. An image has been used as proof of incidents of past events. Digital images are used everywhere like in Surveillance, Military, Medical Diagnosis, Art Piece, Court of Law as evidence, Photography, Social Media, etc. [1]. Nevertheless, telehealth provides health care services remotely to direct the treatment of disease in a proper way using digital information and communication technologies. Hence, it becomes indispensable to justify the authenticity of the digital images before it is used in any application. However, many low-cost image editing tools such as Corel Draw, Photoshop, GIMP, etc. are easily available which can edit, or tamper the originality of the image effortlessly and generate forged images. An image is forged in the sense of adding or removing any object to the original image, which changes the meaning of that image. The foremost target of IF is to hide any important information or to spread the wrong information through images. Now social media being the main platform to violent the society by spreading wrong information to the people by manipulating the images. It becomes very difficult to find out which one is original and which one is forged. Because of IF, we cannot stop the use of images. Hence, it is necessary to examine the authenticity and originality of the image. To detect this CMIF, several methods are introduced. Among them, based on the way of feature extraction technique CMIF detection techniques are divided into two types, i.e., block- based and key-point based technique. This project used the key-point based technique to detect the CMIF. This project shows that it can detect the CMIF in presence of some intermediate attacks (scaling, rotation) and some post-processing attacks (image blurring, noise adding, contrast adjustment, color reduction, brightness changes, and JPEG compression, etc.) and this method also able to detect multiple CMIF. In the rest of this part, Sect. 2 gives a brief introduction to the CMIF. Section 3 describes the methodology to detect the CMIF in detail. Section 4 analyses the experimental results of the method on online available datasets and Sect. 5 addresses the future work plan and concludes the current discussion.
2 Copy-Move Image Forgery In CMIF, to make an image forged, a piece of an image is copied and pasted in the same image at another location. Figure 1 shows a typical example of the copy-move forgery image. In this figure, A represents the original image and B represents the forged image. By observing two images we can notice the forged part clearly. But if the original image A is not known, then it becomes difficult to find whether the given image is original or forged. This is performed in order to hide some information. As the forge region is copy-pasted from the same image, it looks very realistic and
Copy-Move Forgery Detection Using Scale-Invariant Feature …
Original Image
523
Forge Image
Fig. 1 Example of copy-move image forgery [4]
quite difficult to detect. Texture areas, like grass, sky, foliage, gravel, water, etc. are appropriate for this objective because the copied area will perhaps mix with the background which cannot easily perceive suspicious artifacts by the human eyes [2]. It is not possible to detect CMIF by finding incompatibilities using statistical measure because noise component, color palate and most other sign cant features of the copied part of an image will be coherent with the remaining part of that image. Some intermediate attacks like (scaling, rotation, etc.) and some post-processing attacks like (color reduction, JPEG compression, noise addition, blurring, brightness change, etc.) may present in the forged image [3]. CMIFD becomes more challenging in the presence of the multiple numbers of the copy-move region and/or multiple sizes of the copy-move region in the forged image.
3 Methodology Since the copied and pasted region comes from the same image in CMIF, so their color palate, noise component, dynamic range, etc., are the same, which makes it difficult to detect CMIF by statistical measure [5]. This type of forged image can be detected in two ways (i) block-based and (ii) key-point based. In this paper, the key-point based method is used to detect the CMIF. The block diagram of the method is given in Fig. 2. Each step of the block diagram of the CMIF detection technique is described as follows.
524
B. Das et al.
Fig. 2 Block diagram of copy-move image forgery detection technique
3.1 Key-point Based Feature Extraction Key-point based method is used to extract features from the tested image. Key-point is also known as interest point or anchor point of the image. These points of the image are invariant to rotation and scaling. There are various key-point detectors and descriptors. In this project, SIFT (introduced by Lowe in 2004) [6] is used as the key-point detector, because, along with scale and rotation invariant, it is more accurate as compared to other key-point detectors. The SIFT is worked as both the key-point detector and descriptor. There are mainly four steps involved to detect key-point and generate the descriptor for each key-point from an image. These steps are given below.
3.1.1
Scale-space Extrema Detection
The first step of computation searches over all scales and image locations. It is implemented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation.
3.1.2
Key point Localization
At each candidate location, a detailed model is fit to determine location and scale. Key-points are selected based on measures of their stability.
Copy-Move Forgery Detection Using Scale-Invariant Feature …
3.1.3
525
Orientation Assignment
One or more orientations are assigned to each key-point location based on local image gradient directions. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations.
3.1.4
Key Point Descriptor
The local image gradients are measured at the selected scale in the region around each key-point. These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination. In SIFT the descriptor size is 128.
3.2 Feature Matching For feature matching Brute Force matching strategy is utilized here. In this step, each feature is matched with every other feature. Here feature mean key-points and their descriptor. Here Euclidean distance is calculated between each pair of descriptors to calculate the similarity between them and calculate the distance between key-points placed in the image. For that, some threshold value is fixed for similarity and distance measure. If the similarity value between two features is less than and distance is large than that fixed threshold values, then store these key-points as similar key-points for further processing.
3.3 Localization After getting all matched features, corresponding key-points are depicted in the image for the purpose of localization of the copied and pasted region presented in the forged image. Here the white color in the image represents the detection results detected as copied and pasted region.
4 Experimental Results and Analysis For simulation Python library OpenCV version 3.4.2 is used in Anaconda Navigator. All experimental work is done on Windows 10 (64-bit) operating system, 1.60 GHz processor, and 8 GB RAM. For evaluation of the CMIF detection method online available dataset created by [7], GRIP [8], and CoMoFoD [9] datasets are used.
526
B. Das et al.
The first and second dataset consist of 50 and 80 original images, respectively, and their corresponding forged and binary masking images. All forged images are plain copy-move images (to make the image forged, only copied an image region and pasted it in another region in the same image) and not distorted by any kind of attack (post-processing or intermediate attacks). To test whether this method can detect the forged image in the presence of attacks or not, some images are taken from the CoMoFoD dataset. This image is distorted by intermediate (scale, rotation) and post-processing (brightness change (BC), contrast adjustment (CA), color reduction (CR), image blurring (IB), JPEG compression (JC), and noise adding (NA)) attacks. The image size of GRIP and dataset created by [7] is (1024 ×768) and the image size of CoMoFoD dataset is (512 ×512). Two images are taken from CoMoFoD dataset which contains the multiple copy-move regions to test whether the method can detect the multiple copy-move forgery or not. The evaluation can be done on two different level. One is image-level and the other is pixel-level. Image-level evaluation is done by detecting whether the image is forged or not. If an image contains more than three (or more) matched key-points or blocks, then that image is considered as a forged image. In pixel-level evaluation, each pixel of an image is categorized as copy-move or authentic. If the method is evaluated as image-level, then most of the test images are detected correctly i.e., forged images are detected as forged and original images are detected as the original image. But, here gives more concentration on pixel-level evaluation. Figures 3 and 4 depict the result of some images taken from dataset created by [7] and GRIP dataset, respectively. The average predicted values are given in Table 1, after the evaluation of the method using these two datasets. The evaluation is done using the metrics Recall (R)/True Positive Rate (TPR), Precision (P), False Positive Rate (FPR), and detection accuracy (ACC). The formulas used for calculation of these metrics are as follows. TPR = R = P=
TP TP + FN
(1) (2)
FP FP + TN
(3)
TP + TN TP + TN + FP + FN
(4)
FPR = ACC =
TP TP + FN
where, True Positive (TP): Forged pixels are classified as forged pixels. False Positive (FP): Original pixels are classified as forged pixels. True Negative (TN): Original pixels are classified as original pixels False Negative (FN): Forged pixels are classified as original pixels
Copy-Move Forgery Detection Using Scale-Invariant Feature …
527
Fig. 3 Results of the CMIFD method on dataset created by [7]
To test whether the method can detect the forgery in the presence of various attacks or not one image is taken from CoMoFoD dataset. The original image, forged image, and binary masking image of the test image are given in Fig. 5. Figure 6 shows the results of the test images in the presence of intermediate attacks in pictorial form. And the corresponding numerical results are given in Table 2. Tables 3 and 4 shows the results of the test images in the presence of post-processing attacks. Each image is distorted by different parameters. The distorted parameters applied during forgery for different types of post-processing attacks are given as follows (Fig. 7). “BC”: brightness change, (lower bound, upper bound) = [(0.01, 0.95), (0.01, 0.9), (0.01, 0.8)] “CA”: contrast adjustments, (lower bound, upper bound) = [(0.01, 0.95), (0.01, 0.9), (0.01, 0.8)]
528
B. Das et al.
Fig. 4 Results of the CMIFD method on GRIP dataset
Table 1 Detection results on dataset created by [7] and GRIP dataset Dataset name
Recall/TPR (%)
Precision (%)
FPR (%)
Accuracy (%)
Dataset creared by [7]
92.39
89.35
0.42
98.59
GRIP
71.358
91.70
0.40
97.96
“CR”: color reduction, intensity levels per each color channel = [32, 64, 128] “IB”: image blurring, μ = 0, σ 2 = [0.009, 0.005, 0.0005] “JC”: JPEG compression, quality factor = [20, 30, 40] “NA”: noise adding, averaging filter = [3 × 3, 5 × 5, 7 × 7] Figure 8 depicts the result which detected the multiple forgeries present in the image in pictorial form. The corresponding result in numerical format is given in Table 5.
Copy-Move Forgery Detection Using Scale-Invariant Feature …
529
Fig. 5 a Original, b Forged and c Binary masking image taken from CoMoFoD dataset
Fig. 6 Detection results of the CMIFD method in presence of intermediate attacks on Co-MoFoD dataset
530
B. Das et al.
Table 2 Detection results on CoMoFoD dataset images distorted by intermediate attack Image name
Recall/TPR (%)
Precision (%)
FPR (%)
Accuracy (%)
Translation
97.27
91.81
1.05
98.77
Scale
66.42
94.24
0.36
96.93
Rotation
92.69
91.94
0.99
98.32
Rotation + Scaling
84.19
96.20
0.43
97.81
Table 3 Detection results on CoMoFoD dataset images distorted by post-processing attack Image name
Parameters of distortion
Recall/TPR(%)
Precision (%)
FPR (%)
Accuracy (%)
BC1 BC2 BC3
(0.01, 0.95) (0.01, 0.9) (0.01, 0.8)
97.42 97.21 97.37
91.63 91.63 91.63
1.08 1.07 1.07
98.75 98.74 98.75
CA1 CA2 CA3
(0.01, 0.95) (0.01, 0.9) (0.01, 0.8)
97.57 97.26 97.60
91.21 90.34 92.26
1.14 1.26 0.99
98.72 98.58 98.85
CR1 CR2 CR3
32 64 128
96.67 98.15 97.92
92.09 90.12 91.82
1.00 1.30 1.06
98.74 98.63 98.83
Table 4 Detection results on CoMoFoD dataset images distorted by post-processing attack Image name
Parameters of distortion
Recall/TPR (%)
Precision (%)
FPR (%)
Accuracy (%)
IB1 IB2 IB3
0.009 0.005 0.0005
96.82 76.34 32.49
92.74 99.47 1.0
0.92 0.04 0.0
98.83 97.39 92.69
JC1 JC2 JC3
20 30 40
2.77 13.24 17.15
1.0 75.62 86.34
0.0 0.52 0.33
89.47 90.15 90.74
NA1 NA2 NA3
(3 × 3) (5 × 5) (7 × 7)
8.20 7.97 32.45
99.10 93.16 91.04
8.98 0.07 0.39
90.05 89.97 92.34
After observing the results of the CMIF detection method in pictorial and tabular format, it can be detected that if the forged region is simply copy-move then, this key-point based method can be used with high accuracy. But the detection result is reduced gradually according to the various attacks and the changes of the distorted parameter of the attacks. The method can detect the forgery if the forged region is scaled, rotated, or combination of them. Even it can detect the forgery in the presence of some post-processing attacks like brightness change, contrast adjustment, color reduction, image blur, etc., but the method performs poor in presence of JPEG
Copy-Move Forgery Detection Using Scale-Invariant Feature …
(a) BC
(b) CA
(c) CR
(d) IB
531
(e) JC
(f) NA
Fig. 7 Detection result of BC, CA, CR, IB, JC, and NA
Fig. 8 Results of the multiple copy-move forgery detection
Table 5 Results of multiple copy-move forgery detection Image name
Recall/TPR (%)
Precision (%)
FPR (%)
Accuracy (%)
Image 1
72.43
94.08
0.08
99.41
Image 2
93.35
84.07
0.92
98.79
compression, and noise addition in the forged image. This method can detect the multiple copy-move forgery also. In all cases the FPR is very less (Figs. 7).
532
B. Das et al.
5 Conclusion This paper presents a key-point based CMIF detection method using SIFT. This method can detect the copy-move forgery with more or less accurate in presence of intermediate and post-processing attacks. This method can detect the multiple copymove forgery also. However, this method fails to detect the copy-move forgery, when the forged part placed in smooth regions. In future this problem will be tackled by generating more key-points in the smooth regions.
References 1. C.N. Bharti, P. Tandel, A survey of image forgery detection techniques. in 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), (IEEE, 2016), pp. 877–881 2. J. Fridrich, D. Soukal, J. Lukas, Detection of Copy-Move Forgery in Digital Images Proceeding on Digital Forensic Research Workshop (Cleveland, OH, USA, 2003) 3. O.M. Al-Qershi, B.E. Khoo, Evaluation of copy-move forgery detection: datasets and evaluation metrics. Multimedia Tools Appl. 77(24), 31807–31833 (2018) 4. B. Mahdian, S. Saic, Detection of copy–move forgery using a method based on blur moment invariants. Forensic Sci. Int. 171(2–3), 180–189 (2007) 5. N. Muhammad, M. Hussain, G. Muhammad, G. Bebis, Copymove forgery detection using dyadic wavelet transforms. in 2011 Eighth International Conference Computer Graphics, Imaging and Visualization, (IEEE, 2011), pp. 103–108 6. D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004) 7. E. Ardizzone, A. Bruno, G. Mazzola, Copy–move forgery detection by matching triangles of keypoints. IEEE Trans. Info. Forensics Secur. 10(10), 2084–2094 (2015) 8. D. Cozzolino, G. Poggi, L. Verdoliva, Copy-move forgery detection based on patchmatch. in 2014 IEEE international conference on image processing (ICIP), (IEEE, 2014), pp. 5312–5316 9. D. Tralic, I. Zupancic, S. Grgic, M. Grgic, Comofodnew database for copy-move forgery detection. in Proceedings ELMAR-2013, (IEEE, 2013), pp. 49–54
SP-EnCu: A Novel Security and Privacy-Preserving Scheme with Enhanced Cuckoo Filter for Vehicular Networks Righa Tandon
and P. K. Gupta
Abstract Security and privacy of vehicular communication are the two major challenges in vehicular network. In this paper, we have proposed a system model that focuses on security and privacy preservation of vehicles in the network. In order to preserve the privacy, Pseudonyms are assigned to each individual vehicle in the network. Reduced Round-Advanced Encryption Standard (RR-AES) has been used for ensuring the message security in the vehicular network. Furthermore, enhanced cuckoo filter has been implemented for verifying the authorised vehicles in the network. Results obtained for performance evaluation represents that the proposed scheme successfully reduces the time of encryption/decryption. Also, the proposed scheme minimises the false positive rate by 50% when compared with existing scheme. Keywords Vehicular network · Authentication · Privacy preservation · Enhanced Cuckoo filter · Vehicles
1 Introduction The recent breakthroughs in the communication technologies have focused on intelligent transportation system (ITS) in which vehicular networks are gaining much attention. The major aspects of vehicular networks are the network architecture, network security, vehicle-to-vehicle communication and vehicle security. In the past, extensive work has been done on network architecture and vehicle-to-vehicle communication. With the increase in the number of vehicles, it becomes essential to
R. Tandon (B) · P. K. Gupta Department of Computer Science and Engineering, Jaypee University of Information Technology, Solan, Himachal Pradesh 173234, India e-mail: [email protected] P. K. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_52
533
534
R. Tandon and P. K. Gupta
address and enhance the security and privacy of the vehicles. Thus, the main focus of this paper is on the security and privacy preservation of the vehicles. In the proposed work, a vehicular network has been considered in which only registered and verified vehicles are allowed. Vehicles have been registered using their real-identities after which pseudo-identities (Pseudonyms) are assigned to preserve their real identities. This ensures the privacy preservation of the vehicles. We have also implemented RR-AES for securing the message of the vehicles during communication in the network. Vehicles verify other vehicles before communication in the network by using Enhanced Cuckoo Filter. This has been done to check their authenticity and to enhance the system performance. The overall contribution of this paper can be summarised as: • • • •
Proposes a system model for vehicular network. Provides privacy preservation for vehicles using Pseudonyms. Provides security to the messages in the network using RR-AES. Implementation of enhanced cuckoo filter for vehicle-vehicle verification during communication.
Sect. 2 discusses the existing work. Section 3 gives the system model in which registration and verification process of vehicles is discussed. In Sect. 4 the detailed process of security and privacy preservation of vehicles along with enhanced cuckoo filter is described. Section 5 shows the performance evaluation and comparison with existing schemes. Section 6 concludes the work.
2 Related Work The existing work related to the proposed scheme is discussed in this section. This section is further divided into three sub-sections that categorizes the previous work done.
2.1 Privacy-Preserving Schemes In [1], a privacy-preserving authentication scheme is proposed for vehicular network which focuses on security related problems without using bi-linear pairing. This scheme is capable of enhancing the security and improving the performance of the system. This scheme is also suitable for real-time applications as it reduces computational and communication cost. Further, an authentication scheme is proposed in [2] that uses Pseudonymous privacy preservation for vehicular environment. This scheme satisfies all the privacy requirements and minimizes the revocation overhead. A privacy-preserving protocol is used in [3] for secure communication among vehicles in the network. Here, identity-based signatures are used which reduce the
SP-EnCu: A Novel Security and Privacy-Preserving …
535
certificate management overhead. The results of this scheme show that the communication overhead is reduced and the overall performance of the system is further enhanced. For communication among vehicles, a privacy-preserving framework is introduced in [4] that is based on group signature technique. This framework helps in achieving integrity of data, authentication and accountability. This also helps in detecting any alteration in the messages sent by an unauthorised vehicle during communication in the network. In [5], Pseudonym-based authentication is proposed in which Pseudonyms are generated by maintaining the security of the system. A mechanism is designed in this paper that minimises the security overhead and also maintains the robustness of the network incase of heavy traffic. In [6], a secure framework is proposed based on LIAU authentication and LSMB protocol. The proposed framework can handle various security attacks.
2.2 Encryption Schemes In [7], a lightweight encryption scheme is proposed for vehicle communication. This scheme can overcome the security issues of controller area network very efficiently. In this, AES-128 bit key is used for encryption. Also, it is much faster and does not require excessive memory during communication. Diffie-Hellman and AES is used in [8] because of their much better performance for preventing any security breach during communication. A hybrid scheme of elliptic curve cryptography and AES is used in [9] for ensuring safe delivery of messages to the particular vehicle. This scheme is secure, effective and efficient. In [10], SHA-1 symmetric encryption algorithm is used for maintaining the information integrity. This helps in transfering the data securely among vehicles.
2.3 Cuckoo Filter In [11], a privacy-preserving scheme is proposed using cuckoo filter for vehicular networks. This cuckoo filter helps in batch verification phase and also satisfies the message authentication requirement. This proposed scheme is very efficient when compared with other similar schemes and meets the security requirements of the network. Privacy-conserving authentication scheme along with Cuckoo filter is used in [12]. Positive and negative cuckoo filter is used with binary search algorithm to validate the keys. This provides more efficiency in batch verification phase of the network. In [13], Pseudonym root with cuckoo filter based authentication scheme is proposed. This scheme is capable of handling security attacks and also provides anonymity for preserving vehicle privacy in the network. Further this scheme has low communication and computational overhead.
536
R. Tandon and P. K. Gupta
3 System Model In the system model, vehicular network consists of three entities: the regulatory authority (RA), roadside units (RSU) and vehicles as shown in Fig. 1. • Regulatory Authority (RA): RA is the primary authority regulating the vehicular environment. It is the central storehouse of all the information related to the vehicles that create a network. The vehicles that form a network have to get themselves registered with RA. To preserve the privacy of the registered vehicles, the RA assigns pseudo-identities known as Pseudonyms to each vehicle along with a key which can be used to encrypt the message that is to be sent by the vehicle. Further, the RA relays the Pseudonyms of the registered vehicles to the RSU wirelessly. • Roadside Unit (RSU): RSU checks the validity of the vehicles that want to communicate in the network by verifying their Pseudonyms with the Pseudonyms received from RA. After the verification, vehicles can now communicate in the network. RSU acts as mediator between the vehicles and RA. • Vehicles: Each vehicle in the network is equipped with an on-board unit (OBU) using which the vehicles can communicate with each other and also the RSU.
Fig. 1 System model
SP-EnCu: A Novel Security and Privacy-Preserving …
537
3.1 Registration Process The first step for the vehicles to be a part of the vehicular network is getting registered themselves with the RA. The vehicles use their real identities for the registration process after which the RA authenticates the vehicles individually and assigns Pseudonyms to each authenticated vehicle. A copy of Pseudonyms assigned to each vehicle is also shared with the RSU for verification purpose. While assigning the Pseudonyms, the RA also generates a key for each vehicle by which the vehicle can encrypt a message that it wants to send in the network. This is done so as to secure the message against any integrity attack.
3.2 Verification Process After the registration process, the RSU verifies the vehicles present in the network. This is done by cross-checking the Pseudonyms of the vehicles with the copy of Pseudonyms received from the RA. If the vehicles’ identity match, it is allowed to be a part of the network or else the particular vehicle is revocated from the network. The various notations have been used in the proposed work and are shown in Table 1.
Table 1 Notations used Notation
Illustration
Vi RA R SU KRA K Vi Rid Pid R SU (Pid ) Sig K Vi (Pid )
ith vehicle Regulatory authority Roadside unit Key generated by RA Key generated using RR-AES Real identity of the vehicle Pseudo identity of the vehicle Copy of Pid with R SU Signature of ith Vehicle
ξ K R A (Pid , Sig K Vi (Pid ))
Encrypted Message
fn
f inger print (X n )
538
R. Tandon and P. K. Gupta
4 Detailed Security and Privacy-Preserving Process with Enhanced Cuckoo Filter This section elaborates the proposed security and privacy preservation process for vehicles in the vehicular network. This section is further divided into the following subsections.
4.1 Registration and Pseudonym Assignment The proposed system model focuses on privacy preservation of vehicles. This is achieved by assigning Pseudonyms to each vehicle at the time of registration by which their real identity is hidden and hence preserved. The process for vehicle registration and verification is shown in Algorithm 1. When a vehicle enters into the network, it is verified by the RSU by collating the Pseudonym received from the RA and the Pseudonym of the vehicle. If both the Pseudonyms match, the vehicle is allowed to enter and communicate in the network or else it is revocated from the network. Algorithm 1 Registration and Verification Algorithm 1. Registration of vehicle: 1.1. Register with RA using Rid . 1.2. Get Pid and K R A . 1.3. Relay the Pid to RSU. 2. Verification of vehicle by RSU: 2.1. Collate the Pid . 2.2. IF R SU (Pid ) == V (Pid ) THEN 2.3. Vi is allowed into the network. 2.4. ELSE Vi is revocated.
4.2 Reduced Round-AES Advanced Encryption Standard (AES) is the symmetric encryption algorithm that uses the same key of length 128 bit for both encryption and decryption process. The overall encryption process is carried out in 10 rounds and each round consists of the following operations: • Byte substitution: In this step every input byte is converted into a value that is substituted from the S-box. Resulting output is a 4 × 4 matrix. • Shifting of rows: Each row of the 4 × 4 matrix is shifted to the left by 0, 1, 2, 3 bytes respectively.
SP-EnCu: A Novel Security and Privacy-Preserving …
539
• Column mixing: The four bytes of every column are processed to form a new column using a mathematical function. • Add Round Key: The result of the column mixing is XORed with 128 bit input text. The output of this step is the encrypted text if it is the last round. In the proposed model, we have used Reduced Round-AES that is very similar to AES with the main difference being reduced number of rounds. Instead of using the traditional 10 rounds approach for encryption, we use 7 rounds that reduces the computational time complexity and enhances the system performance. The main aim of the proposed model is to maintain message integrity and authentication. This is done by using Reduced Round-AES in which vehicles can sign their respective messages to maintain the vehicle authenticity and message integrity. Algorithm 2 shows the encryption and message signing process that is carried out by the vehicle. After successful verification of the vehicle, it generates a signature by signing its Pid using K Vi , that is the key generated from Reduced Round-AES. Then the vehicle encrypts the message that it wants to send by using K R A , received from the RA, along with its signature. This encrypted message is then sent to the RA via RSU where RA decrypts this message and checks the Pid of the vehicle. It also verifies the vehicle’s signature using K R A . Algorithm 2 Encryption and Message Signing Algorithm 1. Vehicle enters the network. 2. Vehicle signs the Pid using K Vi as: Sig K Vi (Pid ) 3. Vehicle encrypts the message and sends it to RA via RSU as: a = ξ K R A (Pid , Sig K Vi (Pid )) 4. RA receives a, decrypts it and checks the correctness of Pid . 5. RA verifies the vehicle’s signature using K R A .
4.3 Enhanced Cuckoo Filter Cuckoo filter is used for storing and retrieving the elements using hashing techniques. We have used this filter because of its better search performance. It consists of certain buckets (B) where every bucket can contain various items. Cuckoo filter stores only the hash value of the data items known as fingerprint which is calculated using hash function. The fingerprints can be stored in any of the two hash locations calculated using Eqs. 1 and 2. H 1 = hash(X )mod(B) (1) H 2 = (H 1 ⊕ hash( f inger print (X )))mod(B)
(2)
In the proposed work, we have enhanced the traditional cuckoo filter by adding an index cell to each bucket that keeps the track of each fingerprint added. The process
540
R. Tandon and P. K. Gupta
of inserting the elements in enhanced cuckoo filter is very similar to that of the traditional cuckoo filter. In the proposed network, when the vehicles communicate with each other they have to verify each others signature using cuckoo filter. The input data item (X) for the enhanced cuckoo filter is the Pid of each vehicle. The first and the second hash locations are calculated using Eqs. 1 and 2. Algorithm 3 Verification of vehicles’ signature using enhanced cuckoo filter 1. Vehicle Vi verifies f n of Vn using EnCuckooFilter 2. IF f n is in PositiveFilter THEN 3. Vi verifies f n for N egativeFilter 4. IF f n is not in N egativeFilter THEN 5. Fingerprint is verified 6. ELSE 7. Vi declares Vn as unauthorised
In vehicular network if any vehicle Vi wants to communicate with any other vehicle Vn , then it first verifies the fingerprint of that vehicle using enhanced cuckoo filter. Further it will check for the presence of the fingerprint in the positive and negative cuckoo filters. Positive filter contains the values that are surely present in the enhanced cuckoo table, whereas the negative filter contains those values that may be present in the enhanced cuckoo table. The overall process is given in Algorithm 3.
5 Performance Evaluation and Comparison In this section, we have evaluated the performance of the proposed work by comparing it with the existing schemes. Table 2 shows the comparison of the proposed encryption algorithm with existing encryption algorithms on the basis of their key size, encryption/decryption time and number of rounds. In the proposed work, we have considered 1000 vehicles whose fingerprints are stored in the enhanced cuckoo filter. 5000 look-ups are performed on these 1000
Table 2 Comparison of different encryption algorithms Algorithm Key size (in bits) Encryption/decryption Number of rounds time (in ms) DES TDES RSA AES Proposed
56 168 1024 128 128
1.11 2.33 5.53 1.03 0.60
16 48 1 10 7
SP-EnCu: A Novel Security and Privacy-Preserving …
541
Fig. 2 False positive rate comparison of traditional with enhanced cuckoo filter Table 3 Lookup throughput comparison of Enhanced Cuckoo filter Proportion of Bloom filter 3D Bloom filter Cuckoo filter positive queries 0 25 50 75 100
8.2 6.1 5.7 5.3 5.0
9.5 7.4 8.0 7.1 6.7
11.3 9.4 9.8 8.2 7.6
Enhanced Cuckoo filter 14.1 13.5 15.7 14.9 15.2
fingerprints and the false positive rate is calculated. We have compared the proposed enhanced cuckoo filter with the traditional one on the basis of false positive rate for the same values. The false positive rate for the traditional and enhanced cuckoo filter comes out to be 0.0514 and 0.0258 respectively and is shown in Fig. 2. Furthermore, we have also compared the lookup throughout (thousands of lookup operations performed per second) of enhanced cuckoo filter with bloom filter [14] and traditional cuckoo filter [15]. We have gradually increased the proportion of positive queries(the number of queries successfully found in the enhanced cuckoo filter) from 0% to 100%. The result is shown in Table 3.
542
R. Tandon and P. K. Gupta
6 Conclusion In this paper, we have proposed a system model for secure privacy preservation of the vehicles that form the vehicular network. For ensuring the privacy of the vehicles, Pseudonyms are assigned. Also, for security purpose we have used RR-AES which encrypts the message that is to be sent in the network. During message sharing in the network, vehicle-to-vehicle verification is done using enhanced cuckoo filter. Performance of the system model is evaluated and compared with other schemes. Comparisons show that the proposed system model has faster encryption/decryption time. Further, the false positive rate is reduced by 50% and has a better lookup throughput.
References 1. D. He, S. Zeadally, B. Xu, X. Huang, An efficient identity-based conditional privacy-preserving authentication scheme for vehicular ad hoc networks. IEEE Trans. Inf. Forensics Secur. 10(12), 2681–2691 (2015) 2. Y. Sun, R. Lu, X. Lin, X. Shen, J. Su, An efficient Pseudonymous authentication scheme with strong privacy preservation for vehicular communications. IEEE Trans. Veh. Technol. 59(7), 3589–3603 (2010) 3. X. Lin, X. Sun, P.H. Ho, X. Shen, GSIS: A secure and privacy-preserving protocol for vehicular communications. IEEE Trans. Veh. Technol. 56(6), 3442–3456 (2007) 4. J. Guo, J.P. Baugh, S. Wang, A group signature based secure and privacy-preserving vehicular communication framework, in Mobile Networking for Vehicular Environments (MOVE), pp. 103–108 (2007) 5. G. Calandriello, P. Papadimitratos, J.P. Hubaux, A. Lioy, Efficient and robust Pseudonymous authentication in VANET, in Proceedings of the Fourth International Workshop on Vehicular ad hoc networks (MOBICOM), Canada (ACM, 2007), pp. 19–28 6. R. Tandon, P.K. Gupta, SV2VCS: a secure vehicle-to-vehicle communication scheme based on lightweight authentication and concurrent data collection trees. J Ambient Intell Human Comput (2021). https://doi.org/10.1007/s12652-020-02721-5 7. Z. Lu, Q. Wang, X. Chen, G. Qu, Y. Lyu, Z. Liu, Leap: A lightweight encryption and authentication protocol for in-vehicle communications, in IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, pp. 1158–1164 (2019) 8. P.V. Vivek Sridhar Mallya, A. Ajith, T.R. Sangeetha, A. Krishnan, G. Narayanan, Implementation of differential privacy using Diffie-Hellman and AES Algorithm, in Inventive Communication and Computational Technologies. Lecture Notes in Networks and Systems, vol. 89, ed. by Ranganathan G., Chen J., Rocha Á. (Springer, Singapore, 2020), pp. 143–152 9. S.A. Shah, C. Gongliang, L. Jianhua, Y. Glani, A dynamic privacy preserving authentication protocol in VANET using social network, in Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. SNPD 2019. Studies in Computational Intelligence, vol. 850, ed. by Lee R. (Springer, Cham, 2020), pp. 53–65 10. A.M. Abdelgader, F. Shu, Exploiting the physical layer security for providing a simple user privacy security system for vehicular networks, in International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE), Khartoun, pp. 1–6 (2017) 11. J. Cui, J. Zhang, H. Zhong, Y. Xu, SPACF: A secure privacy-preserving authentication scheme for VANET with cuckoo filter. IEEE Trans. Veh. Technol. 66(11), 10283–10295 (2017)
SP-EnCu: A Novel Security and Privacy-Preserving …
543
12. A. Rengarajan, M.M. Thaha, SPCACF: Secured privacy-conserving authentication scheme using Cuckoo Filter in VANET. Scalable Comput.: Pract. Exp. 21(1), 101–105. (2020) https:// doi.org/10.12694/scpe.v21i1.1626 13. M.A. Alazzawi, H. Lu, A.A. Yassin, K. Chen, Robust conditional privacy-preserving authentication based on Pseudonym root with Cuckoo filter in vehicular ad hoc networks. KSII Trans. Int. Inf. Syst. (TIIS) 13(12), 6121–6144 (2019) 14. A. Malhi, S. Batra, Privacy-preserving authentication framework using bloom filter for secure vehicular communications. Int. J. Inf. Secur. 15(4), 433–453 (2016) 15. B. Fan, D.G. Andersen, M. Kaminsky, M.D. Mitzenmacher, Cuckoo filter: Practically better than bloom, in Proceedings of the 10th International on Conference on emerging Networking Experiments and Technologies (CoNEXT’14), ACM, Sidney, pp. 75–88 (2014)
Reversible Region-Based Embedding in Images for Secured Telemedicine Approach Prachee Dewangan, Bijay Ku Paikaray, Debabala Swain, and Sujata Chakravarty
Abstract In the current era, to provide quality healthcare service remotely, telemedicine is one of the well-known techniques worldwide. The prescribed diagnosis disease details the doctor for faster diagnosis over the private and public channels. Medical information needs to be secured over an unsecured internet network as it contains personal information inside, like CT-scan, X-Ray, MRI, etc. It is a challenge of security in medical data like privacy, confidentiality, and integrity of patient records. After deep investigation, the existing techniques like encryption and digital watermarking are not always resourceful in real-time. This research work analyzes the problem and delivers the solution of security for recovering medical images. Here is the basic resolution for safe image communication by preserving those selective regions of images which carry sensitive medical diagnosis information called Region of Interest (ROI). The ROI of the medical image is irregularly presumed which preserves essential information. This research work gives a broad overview of data hiding in ROI medical images by embedding in its selective blocks. The performance parameters are verified, which proves the preservation of sensitive data in medical images. Keywords Telemedicine · Sensitive region · Region-Based approach · Image embedding · ROI · RONI
1 Introduction In our day to day, life healthcare service is as essential as food, cloth, and shelter. In every topography healthcare service is still a challenge. In rural and remote areas, it
P. Dewangan · D. Swain Dept. of Computer Science, Rama Devi Women’s University, Bhubaneswar, India B. K. Paikaray (B) · S. Chakravarty Dept. of CSE, Centurion University of Technology and Management, Sitapur, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_53
545
546
P. Dewangan et al.
is very difficult to provide quality healthcare service. Especially in developing countries, the physician to patient percentage is not good. To solve this problem different telemedicine applications like telediagnosis, telesurgery, and so on are served as an advanced tools. These applications are refining and transporting healthcare services at any location to the patients. With the saturation of network communication, the virtual healthcare is not only crucial for remote and rural areas, but also for the doctors to diagnose using modern technology. The telemedicine is cheaper than the traditional hospital services [1, 2]. The patients’ private information like the Electronic Patient Record (EPR) and diagnosis images must be kept secret. EPR needs to be hidden in the original image using different security techniques. At the receiver end, the reverse security techniques are used to recover the original EPR without any loss. Reversible Data Hiding (RDH) techniques are another possibility to preserve the sensitive and private information of the patient. Embedding is the process on the medical images which serve as region-based techniques (using ROI, RONI) for privacy and sensitive data preservation. This paper proposes a reversible region-based approach for data hiding and its retrieval. For high security of medical images, there are different steganography techniques like watermarking, RDH are approached which needs to be improved for the security, reversibility, authenticity, integrity, etc. [3–5]. In the proposed method the original image segments into ROI and RONI. Experimental results reveal that the proposed method has complete reversibility, authenticity, and integrity and privacy preservation with better performance comparing with the cutting-edge methods. The rest of part in this paper organization is as: Sect. 2 discusses the recent related works. Section 3 defines the proposed reversible selective embedding scheme. Section 4 represents the experiment result with performance analysis. Finally, the paper is concluded with future work in Sect. 5.
2 Related Works In this section, we have specified the techniques behind four recent research work, namely, Biksi et al.’s research work [6], Zhaoxia et al.’s techniques [7], Qasim et al.’s proposed work [8], and Liu et al.’s method [9]. They have implemented different authentication and recovery techniques in digital medical images like integrity check value (ICV), Reversible Data Hiding (RDH) in Encrypted Images (RDHEI), EPR hiding, Message Authentication Code (MAC), or Digital Signature (DS). After that extra security care should be taken for the authenticity and protection of the header information. Various techniques on medical image authentication and integration are discussed in Sect. 2.1 to 2.4.
Reversible Region-Based Embedding in Images …
547
2.1 Secure Telemedicine Using RONI Halftoned Visual Cryptography Without Pixel Expansion [6] In 2019 Biksi et al. proposed a method, without pixel expansion of visual cryptography to secure telemedicine using RONI halftoned. This proposed algorithm explained with five algorithm wherein First algorithm is the embedding process. In this process, it inputs the test image and outputs the coordinates 10 ROI. In Second algorithm, it works on a slice of the ROI and returns an integrity check value (ICV) [7]. The Third algorithm implements the halftone, which takes grayscale image and converts to binary with rows and columns. Then calculates the threshold value and returns a halftone. The Fourth algorithm is the GenerateShare, it inputs the test image and produces the three shares of the input image without pixel expansion. The Fifth algorithm represents the whole approach of 1, 2, 3, 4 algorithm and calls this algorithm as per the required operation. In this research Fig. 1 denotes the original test MRI image of the patient from the NEURO MRI database. A result of the approached algorithm performance has been implemented using the metrics, peak signal-to-noise ratio (PSNR), structure similarity index map (SSIM), and Accuracy [8]. There are mostly four major process steps: First it processes the embedding information consequence of RONI to be used for integrity check, in the Second process some signature information (patient ID) calculates the ICV and
(a)
(d)
(b)
(e)
(c)
(f)
Fig. 1 MR Image a Original image b Embedded image c, d, e Share 1, 2, and 3 f Share 1, 2, and 3 output overlapping [6]
548
P. Dewangan et al.
embeds it, the Third process reduces the complexity and improves the quality of halftoning embedded RONI region, and lastly, in the Fourth process, the embedded RONI generates the shares using VC. This results it’s robust against the man in the middle attack. During the integrity check, it will check that the attacker modifies the shares or not. Suppose it happens, then the VC define is as noise.
2.2 Reversible Data Hiding in Encrypted Images Based on Multi-MSB Prediction and Huffman Coding [9] In 2020 Zhaoxia et al. proposed a techniques, RDH (Reversible Data Hiding) in Encrypted Images (RDHEI) established Huffman coding on multi-most significant bit (MSB) prediction. In this proposed method the data extraction method is higher capacity with error-free and decryption based on Huffman coding and multi-MSB prediction. Three phases are defined in the proposed technique as shown in Fig. 2. In the First phase, the image encrypted by the contained owner. Then the embedding process calculates a label map. Each pixel map to an adaptive multi MSB replacement. In the Second phase, the data is hidden using the encryption key or data hider key. Third phase is the image decryption and data extraction at the receiver end. The receiver recovers the image using the data hider key. The experiments are done on three databases: BOSSBase, BOWS-2, and UCID. The Man, Airplane, Baboon, Lena, Jetplane, and Tiffany are used as test images. The test results majored through the embedding rate (ER) in bpp (bits per pixel) and PSNR (Peak signal-to-noise ratio) and SSIM (structural similarity).
Fig. 2 Three phases model [9]
Reversible Region-Based Embedding in Images …
(a)
(b)
549
(c)
(d)
Fig. 3 a Original image, b Watermarking image, c Extracted image, d Difference between original and extracted image [10]
2.3 ROI-Based Reversible Watermarking Scheme for Ensuring the Integrity and Authenticity of DICOM MR Images [10] In 2019 Qasim et al. proposed a reversible watermarking method on the ROI for the authenticity, integrity of MRI DICOM images. Watermark creation, embedding, and extraction are used to comprise the experiment. In the creation of watermark, it follows watermark authentication and watermark integrity. In the phase of watermark authentication, it holds the metadata (diagnosis device, diagnosis result, and image parameter) about patient key information. In watermark integrity phase, MD5 algorithm is used to send and retrieve MAC. The embedding process segments the MR image to find smooth blocks from the entire brain region as ROI. Identified smooth blocks encode with generated watermarking. In the verification process of data extraction, the threshold value T and the watermark length are derived from RONI to identify the smooth blocks of ROI (Fig. 3).
2.4 A Novel Robust Reversible Watermarking Scheme for Protecting Authenticity and Integrity of Medical Images [11] In 2019 Liu et al. proposed a robust reversible watermarking technique for medical image authentication and integration. Watermark generation, embedding, extraction, and security verification are the four phases in this method. A hospital logo is used for authentication and integration to generate watermark [12]. It finds the most significant value by using Slantlet Transform (SLT), Singular Value Decomposition (SVD) in the watermark embedding phase explained in Fig. 4. The secret key K and non-overlapping blocks are used for extraction. In the verification phase, it authenticates the received image. The test images are CT image, MR image, X-ray image, Ultrasound image, fundus image, and Hospital logo.
550
P. Dewangan et al.
(a)
(d )
(b)
(c)
(e )
(f)
Fig. 4 a–c Original images, d–f Watermark images [11]
3 Proposed Technique The proposed reversible embedding architecture can be divided into four phases: image segmentation, image embedding, image authentication, and image extraction. The proposed architecture is described in Fig. 5.
Fig. 5 The proposed embedding architecture
Reversible Region-Based Embedding in Images …
551
3.1 Image Segmentation and Embedding The DICOM images are used as test image in this proposed research. Image segment phase ROI and RONI are the two regions of the test image. ROI is the selected diagnosis region user variable with 128 × 128 size. The RONI blocks are selected from four corner pixels of the image whose size is equal one fourth (1/4) of ROI. In the proposed technique four RONI blocks are segmented from the image whose size is 64 × 64. In the phase of image embedding, a smooth block in the ROI can be defined as those blocks that have the least significant difference with RONI. In the experiment, we have divided the ROI into four equal quadrant blocks. As the ROI is selected with a block size of 128 × 128 hence each quadrant holds 64 × 64 number of pixels. In the next process, four blocks of RONI with size 64 × 64 are selected from four corner pixels from the DICOM image. The relative PSNR between the ROI quadrant and RONI has to be computed. The ROI quadrant with optimal PSNR will be identified as the smooth block and selected for embedding. After identification of smooth block, the embedding algorithm is used to perform the bitwise XOR operation between the pixels of the smooth region and RONI selected. Step 1: Let the original image be I n×n. Step 2: Crop the ROI segment I ROI of size [m, m]. Now the remaining image is known as I RONI . Step 3: Divide the I ROI into four equal quadrants of size [m/2, m/2]. Let it be ROI1 , ROI2 , ROI3 , and ROI4 Step 4: Select the four RONI blocks from four corners of the image with size [m/2, m/2]. Let it be RONI1 , RONI2 , RONI3 , and RONI4 Step 5: Find the PSNRoptimal between respective ROI and RONI quadrants as follows. PSNRoptimal = max 4{PSNR(ROIi , RONIi )} i=1
(1)
Step 6: Identify the smooth block ROIsmooth [m/2, m/2] as ROIi , for which PSNRoptimal was evaluated. Step 7: To find the ROIsmooth , perform bitwise ⊕ between the ROIi [m/2, m/2] and RONIi [m/2, m/2] for which value of i, the PSNRoptimal is obtained. for each p m P for each q m Q ROIsmooth [ p, q] = ROIi [ p, q] ⊕ RONIi [ p, q]
(2)
552
P. Dewangan et al.
3.2 Image Authentication and Extraction This phase is used to extract the RONIi from the embedded image. It also authenticates of the received image. If there is a difference in the RONIi between the embedded image of the sender and receiver then it signifies that the sent image has tampered in RONIi . In this case, there is a need of resending the embedded image. If the RONIi is identical then the image is authenticated and it is used for the next phase of image extraction. Step 1: If there is a difference between actual and received RONIi then, (a) It signifies that the RONI has been tampered in the received image. So the sender needs to resend the image. Otherwise, (b) The image is authenticated and follows step 2 and 3 for extraction. Step 2: The ROIi can be extracted from ROIsmooth as follows. for each p m P for each q m Q ROIi [ p, q] = ROIsmooth [ p, q] ⊕ RONIi [ p, q]
(3)
Step 3: Integrate the ROIi in IROI to recover the original image I n×n .
4 Experimental Results and Discussion The test images used are taken from the databases of idimages and OsiriX. The test MR DICOM images are with size of 512 × 512 [17, 18]. The sensitive region ROI of size 128 × 128 was adopted. The RONI regions of size 64 × 64 were selected from four corner pixels of the original image. The ROI also gets divided into four symmetric quadrant of size 64 × 64. After segmentation, the smooth ROI blocks are identified using PSNR optimal function with respect to RONI segments. The smooth ROIi with more similarity with the RONI segment is selected as smooth blocks for embedding. Then the bitwise XOR is performed for embedding and sent to the receiver. The output of the above experimental operations is demonstrated in Fig. 6a–f. The difference between the embedded image and the original image cannot be differentiated visually. So it becomes difficult to identify the ROI. After receiving the embedded image the receiver has to check its authenticity by verifying the RONI block used for embedding. If it is verified then the actual ROI can be recovered from
Reversible Region-Based Embedding in Images …
(a)
553
(b)
(d)
(c)
(e)
(f)
Fig. 6 a Original image, b Embedded image c Recovered image d Cropped ROI image e Embedded ROI f Reverse ROI
Table 1 Performance review of the proposed scheme with existing reversible schemes Reversible schemes [6]
[10]
[13]
[14]
[15]
[16]
Proposed
PSNR
28.85 dB 99.94 dB 49.01 dB 23.06 dB 76.5 dB 72.28 dB ∞
SSIM
0.8309
1
0.9999
0.9047
1
NA
1
the embedded image using reverse embedding in the ROIsmooth blocks. At last, the retrieved ROI can be integrated into the received image to recovered image. The reverse embedding and recovery processes are shown in Fig. 6. The PSNR and SSIM parameters are compared with existing reversible algorithm in Table 1. It illustrates that the proposed technique outperforms other techniques in terms of reversibility and quality recovery.
5 Conclusion and Future Scope In telemedicine scenarios, real-time diagnosis of the disease is very essential. It depends on the visual quality of the medical image, so a complete reversibility of embedded image is highly essential. As it includes the sensitive blocks, so it
554
P. Dewangan et al.
should be well hidden and not to be visually noticeable. The proposed algorithm is a reversible selective embedding technique on DICOM medical images for ensuring the integrity and authenticity. Specifically, the proposed method is superior in terms of smooth block selection, data hiding, and its complete reversibility. The quality level of the received image is much better with the minimal computational time due to its simple data hiding operations. The PSNR and SSIM verifies the visual quality of the recovered image. In future work, the main focus will be on tampered detection and recovery methods in ROI with complete reversibility.
References 1. G. Coatrieux, H. Maitre et al., Relevance of watermarking in medical imaging. in Proceedings of IEEE EMBS International Conference on Information Technology Applications in Biomedicine, (2000), pp. 250–255 2. L.O.M. Kobayashi, S.S. Furuie, P.S.L.M. Barreto, Providing integrity and authenticity in DICOM images: a novel approach. IEEE Trans. Inf Technol. Biomed. 13, 582–589 (2009) 3. O.S. Pianykh, in Digital Imaging and Communications in Medicine (DICOM): A Practical Introduction and Survival Guide. vol. 50(8), 2nd edn. (Springer, Berlin, 2009) 4. S.M. Mousavi, A. Naghsh, S. Abu-Bakar, Watermarking techniques used in medical images: a survey. J. Digit. Imag. 27, 714–729 (2014) 5. A.F. Qasim, F. Meziane, R. Aspin, Digital watermarking: applicability for developing trust in medical imaging workflows state of the art review. Comput. Sci. Rev. 27, 45–60 (2018) 6. A. Bakshi, A.K. Patel, Secure telemedicine using RONI halftoned visual cryptography without pixel expansion. J. Inf. Secur. Appl. 46, 281–295 (2019) 7. C.A. Hossain, M.S.R. Zishan, D.R. Ahasan, A review on the security issues of telemedicine network. Int. J. Innov. Res. Electr. Electron. Instrum. Control Eng. 2(11) (2014) 8. S. Maheshkar, Region-based hybrid medical image watermarking for secure telemedicine applications. Multimedia Tools Appl. 76(3), 3617–3647 (2017) 9. Z. Yin, Y. Xiang et al., Reversible data hiding in encrypted images based on Multi-MSB prediction and huffman coding. IEEE Trans. Multimedia 22(4), 874–884 (2020) 10. A.F. Qasim, R. Aspin, F. Meziane et al., ROI-based reversible watermarking scheme for ensuring the integrity and authenticity of DICOM MR images. Multimed. Tools Appl. 78, 16433–16463 (2019) 11. X. Liu et al., A novel robust reversible watermarking scheme for protecting authenticity and integrity of medical images. IEEE Access 7, 76580–76598 (2019) 12. A. Roˇcek, K. Slavíˇcek, O. Dostál, M. Javorník, A new approach to fully-reversible watermarking in medical imaging with breakthrough visibility parameters. Biomed. Signal Process. Control 29, 44–52 (2016) 13. K. Balasamy, S. Ramakrishnan, An intelligent reversible watermarking system for authenticating medical images using wavelet and PSO. Cluster Comput. 1–12 (2018) 14. Y. Yang, W. Zhang, D. Liang, N. Yu, A ROI-based high capacity reversible data hiding scheme with contrast enhancement for medical images. Multimed. Tools Appl. 77, 18043–18065 (2018) 15. W. Pan, D. Bouslimi, M. Karasad, M. Cozic, G. Coatrieux, Imperceptible reversible watermarking of radiographic images based on quantum noise masking. Comput. Methods Prog. Biomed. 160, 119–128 (2018) 16. Atta-ur-Rahman, K. Sultan, N. Aldhafferi, A. Alqahtani, M. Mahmud, Reversible and fragile watermarking for medical images. Comput. Math. Methods Med. 1–7 (2018) 17. https://www.osirix-viewer.com/resources/dicom-image-library/ 18. https://www.idimages.org/images/
A Spatial Domain Technique for Digital Image Authentication and Tamper Recovery Monalisa Swain and Debabala Swain
Abstract Nowadays, digital watermarking techniques are used to protect the integrity and authenticity of digital image and to provide capabilities for self-recovery of tampered locations. In this paper, a blind watermarking technique for image authentication and recovery of tampered area is proposed. Here the watermarking is done in block-wise manner by dividing the original image into non-overlapping blocks of 4*4 sizes. Here watermark data is consisting of authentication data and recovery data. Authentication data for each block is embedded in same block and recovery data is embedded in mapped block. Watermark data is generated using self-embedding techniques. This proposed scheme is checked against different types of attacks and different percentages of content modification of original image. The experimental results represent accurate detection and localization of tamper and high-quality recovery. Keywords Blind watermarking · Fragility · Tamper detection · Image recovery · Spatial domain · Mapping block
1 Introduction In today’s world, with easy availability and accessibility of Internet, communication over Internet has increased exponentially. Most of the information exchanges are happening through digital images. It becomes more critical and challenging to maintain authenticity and integrity of these digital images due to the wide availability of image modification tools [1]. A technique called watermarking is used to address the problems like authentic image identification, tampered area localization, and recovery of modified image content to original value [2]. In watermarking technique, watermark data can be generated either from the original image or can be taken from outside. It also can be some meaningful information. Watermarking is applied on two different domains: spatial domain [3] and frequency domain [4]. Out M. Swain (B) · D. Swain Department of Computer Science, Rama Devi Women’s University, Bhubaneswar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_54
555
556
M. Swain and D. Swain
of different watermarking techniques available, fragile watermark is highly recommended for checking authenticity of image because of its untenable feature against very small changes [5]. Most of the image watermarking scheme use a secret key to embed watermark data which can be used by valid users to extract watermark data [6]. In this paper, a blind watermarking technique for image authentication and recovery of tampered area is proposed. In this scheme, the original image is divided into 4 × 4 sized image blocks, authentication watermark and recovery watermark are generated block-wise. Here we have used a primary key as secret key for generating mapping block number where recovery data has embedded for respective block. This proposed scheme has high fragility due to the use of self-embedding watermarking procedure.
2 Related Work In past, many related works were presented for maintaining integrity, authenticity, and providing self-recovery capabilities whenever any alteration detected. In 2017, Chuan Qin et al. presented a fragile image watermarking scheme with recovery capability. They used [7], which presented a fragile image watermarking with pixel-wise recovery based on overlapping embedding strategy. Here, the block-based mechanism for tamper localization and the pixel-based mechanism for recovery are considered. LSB technique was used for watermark embedding and for authentication data embedding central pixel was chosen. Compared to other methods, the scheme can achieve superior performance of tampering recovery for larger tampering rates. For higher percentage of content modification, this method presents good results. In 2019, Bin Feng et al. presented a semi-fragile watermarking technique for maintaining wholeness and genuineness of digital image along with tamper detection and recovery capability. Watermarking is done on 2Lsb of pixel of the original image using Torus mapping block. Three-layer detection proposed by [8] is used for tamper area identification and recovery achieved by embedded recovery information. By this scheme tampered area is identified accurately and also recovered image quality is satisfactory. In 2020, Omer Hemida [9] presented a self-recovery watermarking scheme fragile in nature. The original image was divided into two different block sizes. Block truncation coding was used for generating watermark data. Here quantum chaos map was used to enlarge the key space and selecting area in image for embedding recovery watermark. In this scheme, imperceptibly of watermarked image is high because of LSB embedding and recovery results also satisfactory for different types of attack.
A Spatial Domain Technique for Digital Image …
557
3 Proposed Method The watermarking scheme proposed has three main phases: (1) Embedding watermark data, (2) Image authentication checking, and (3) Tampered image recovery.
3.1 Embedding Watermark Data The original image I is a grayscale image of size 512 × 512. In this phase, original image is first divided into blocks of size 4 × 4 and watermark data is calculated blockwise using original image pixel intensity value. This watermark data is used for both authentication and recovery of image. Watermark data is the union of authentication data and recovery data. Image authenticity is checked using authentication data, if authentication failed recovery of tampered image achieved using recovery data. For each block in the original image, a mapping block number is generated using a secret key. Authentication data for each block is embedded in the block itself and recovery data is embedded in the mapping block. Figure 1 displays block diagram of the proposed procedure for embedding watermark data. The watermarking steps are as follows: Step 1 The original image I (M*M) is divided into non-overlapping blocks of 4 × 4 pixels. Each block is assigned an integer number B as block number, B ∈{1, Watermarked Cover Image Divide into 4×4 Block
Image Mapping Block Key1 1
8 bits block
8 bits block recovery
authentication data
data
Fig. 1 Block diagram of embedding watermark data
Embedding
Embedding
Key2
558
M. Swain and D. Swain
2, 3……..Z} to in row-wise manner where Z is the total number of blocks generated. Step 2 For each block B, a mapping block number is generated as given below:
B = (key 1 × B) mod Z
(1)
where key1 is secret key, a prime number and key1∈ [1, Z]. Step 3 Authentication data generation process: Authentication data is generated block-wise. 3.1 (a) Each pixel of the block B is converted to binary format and X-OR operation is applied among 7 bits of pixel discarding LSB1 resulting in a single bit for each pixel. (b) After performing X-OR operation for all pixels in the block, 16-bit data is generated. 3.2 On the above 16-bit generated data again X-OR operation is applied between 1–8 bits and 9–16 bits resulting in 8-bit data. 3.3 Column position of each block is taken as key2. Here key2 is converted to binary form and X-OR with above 8-bit data resulting in 8-bit data which is used as authentication data for block B. Step 4 Recovery data generation process: For each block, 8 bits of recovery data is calculated by the following steps. 4.1 4.2
LSB1 of each pixel is set to zero. Average intensity of the block is calculated by the following equation:
Avg =
16
Pixel_Value /16 whereAvg ∈ [0, 255]
1
4.3
Now the average value of the block is converted to 8-bit binary form which is used as recovery data for the respective block. Step 5 Embedding of watermark data: During embedding process authentication data and recovery data are embedded in different locations. Embedding for each block B is done by the following steps: 5.1 All the 16 pixels of block B are converted to binary format and LSB1 of first 8 pixels taken by column-wise manner is replaced by the 8-bit authentication data. Here authentication data is embedded into the block B itself. 5.2 Similarly, all the 16 pixels of block B’ are converted to binary format and LSB1 of last 8 pixels taken by column-wise manner is replaced by the 8-bit recovery data. Here recovery data is embedded into the mapping block B’.
A Spatial Domain Technique for Digital Image …
559
3.2 Image Authentication Checking Image authentication is checked at the recipient end. The received image is first divided into non-overlapping blocks of size 4 × 4. Each block is processed and checked to find if any changes has been made. If any changes found then that block is marked as tampered block. After processing each block, if any tampered block is found then image is not authentic and needs recovery of original content. Figure 2 shows the block diagram of proposed tamper detection and recovery process. Image authentication checking is done by the following steps: Step 1 For each block of the received image, the embedded authentication data is extracted from LSB1 of first 8 pixel taken by column-wise manner where embedding took place. Step 2 Again for each block of the received image, authentication data is generated by the procedure explained in step 3 of embedding stage. Step 3 Both generated and extracted authentication data are matched, if match found, then that block is marked as authentic else marked as tampered block. Step 4 In order to visualize the tampered area, pixel value of tamper block is changed to 255 and pixel of authentic block is changed to zero. This is the temporary changes made only for visualization.
3.3 Tampered Image Recovery This method is used in between two images for measuring the similarity and comparing the watermarked image with the original. It can be calculated as follows. During tampered image recovery process only, the tampered blocks are recovered and authentic blocks remain unchanged. These marked tampered blocks are recovered by the following steps: Step 1 Continuing from the first block check the block is tampered or authentic; if authentic, then step 1 is processed for next block. Step 2 Else mapping block B’ of the tampered block B is calculated as per step 2 in embedding phase. Step 2.1 If mapping block is not marked as tampered during image authentication checking then recovery data kept during embedded phase is extracted and this 8-bit recovery data is converted to decimal form. Each pixel value of tampered block B is replaced by the recovery data and that block is marked as authentic block. Step 2.2 Else mapping block B’ is tampered, recovery process is achieved by taking average of pixel value of authentic blocks only from 8 neighborhood blocks. All the pixels of tampered blocks are replaced by the calculated average value. After recovery block B is marked as authentic block.
560
M. Swain and D. Swain
Received Image
Divided in to 4×4
Mapping
block
block Key
Extract
embedded
authentication bits
Yes Authentic
Calculate authentication bits
No
Apply No block
Mapping
Match
block
block
neighbor
authentic?
hood Yes
Extract recovery bits
Recovered Image
Replace the value
Convert to
of all pixels of the
decimal
block Fig. 2 Block diagram of tamper detection and recovery phase
4 Experimental Results In 2005, Hsieh et al. [4] proposed a hierarchical fragile watermarking scheme for. The proposed algorithm is tested on grayscale image of size 512 × 512 for image authentication and recovery. Here PSNR and MSE are used as measuring parameter.
A Spatial Domain Technique for Digital Image … Table 1 PSNR and MSE of watermarked image w.r.t. original image
Cover image (512 × 512)
561 PSNR (dB)
MSE
Lena
55.44
0.19
Truck
55.13
0.20
Cycle
54.21
0.25
Lake
55.37
0.19
(a)
(b)
Fig. 3 a Original image, b Watermarked image
This proposed procedure is evaluated using four standard test images: Lena, Truck, Cycle, and Lake. The PSNR measures the peak signal-to-noise ratio between two images. It expresses the quality of recovered or reconstructed image with respect to original image. Higher PSNR represents quality of image is good. The MSE measures the cumulative squared error between two images. MSE with lower value represents error is less. The PSNR and MSE values of watermarked images and original images are listed in Table 1. In Table 1, average values of PSNR and MSE are 55.04 dB and 0.20, which implies high visual quality and less error. Figure 3 represents original image and its respective watermarked image. Here watermarked image is visually indistinguishable from original image. The performance of this scheme for tamper detection and recovery is evaluated using these four test images for copy paste attack, connect removal attack, and for different percentage of original content modification. Figure 4 presents the performance of this scheme for copy paste attack and connect removal attack. Here tamper detection and localization are accurate. Recovered image is also satisfactory. Table 2 shows the PSNR and MSE of recovered image for copy paste attack and content removal attack. In Table 2, average values of PSNR and MSE are 49.17 dB and 1.23, which implies satisfactory results of this scheme. Result of content removal attack is less compared to copy paste attack.
562
M. Swain and D. Swain
Copy paste from external image
Content removal attack
(a)
(b)
Copy paste from same image
Copy paste from same image
(c)
(d)
(a)
(b)
(c)
(d)
Fig. 4 a Watermarked image, b Tampered image, c Tampered localized image, d Recovered image
Table 2 PSNR and MSE of recovered image w.r.t. watermarked image
Cover image (512 × 512)
PSNR (dB)
MSE
Lena
50.95
0.52
Truck
50.25
0.61
Cycle
42.68
3.51
Lake
52.78
0.34
Figure 5 presents the performance of this scheme for different percentages of original image content modification attack. Here tamper detection and localization are accurate. Recovered image is also satisfactory. We have tested using Lena image. Table 3 shows the PSNR and MSE of recovered image for different percentages of attack. The PSNR of recovered image with 50% attack is 33.01 dB and below 50% presents satisfactory result. 5% Tampering
10% Tampering
20% Tampering
50% Tampering
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
Fig. 5 a Watermarked image, b Tampered image, c Tampered localized image, d Recovered image
A Spatial Domain Technique for Digital Image … Table 3 PSNR and MSE of recovered image for different percentages of attack
Cover image (512 × 512) (%)
563 PSNR (dB)
MSE
5
44.60
2.25
10
42.20
3.92
20
38.79
8.85
50
33.01
32.55
5 Conclusion This proposed method presents a watermarking scheme for image authentication and tamper recovery using spatial domain value. The use of X-OR operation made this scheme simple but has higher accuracy in tamper localization and satisfactory recovery result. This algorithm is secured by the use of keys. Here, at the receiver end, blind recovery took place only using keys and original image is not needed at receiver end.
References 1. O. Benrhouma, H. Hermassi, S. Belghith, Tamper detection and self-recovery scheme by DWT watermarking. Nonlinear Dynam. 79, 1817–1833 (2014) 2. I.J. Cox, J. Kilian, F.T. Leighton, T. Shamoon, Secure spread spectrum watermarking for multimedia. IEEE Trans. Image Process. 6(12), 1673–1687 (1997) 3. I.G. Karybali, K. Berberidis, Efficient spatial image watermarking via new perceptual masking and blind detection schemes. IEEE Trans. Inf. Forens. Secur. 1(2), 256–274 (2006) 4. B.C. Mohan, S.S. Kumar, Robust digital watermarking scheme using contourlet transform. Int. J. Comput. Sci. Netw. Secur. 8(2), 43–51 (2008) 5. D. Singh, S. Shivani, S. Agarwal, Self-embedd ing pixel wise fragil ewatermarking scheme for image authentication. in Intelligent Interactive Technologies and Multimedia (Springer, Berlin, Heidelberg, New York, 2013), pp. 111–122 6. L. Blum, M. Blum, M. Shub, A Simple unpredictable pseudorandom number generator. SIAM J. Comput. 15(2), 364–383 (1986) 7. C. Qin, P. Ji, X. Zhang, J. Dong, J. Wang, Fragile image watermarking with pixel-wise recovery based on overlapping embedding strategy. Signal Process 138, 280–293 (2017) 8. B. Feng, X. Li, Y. Jie, C. Guo, H. Fu, A novel semi-fragile digital watermarking scheme for scrambled image authentication and restoration. Mobile Netw. Appl. (2019). https://doi.org/10. 1007/s11036-018-1186-9 9. O. Hemida, Y. Huo, H. He et al., A restorable fragile watermarking scheme with superior localization for both natural and text images. Multimedia Tools Appl. 78, 12373–12403 (2019)
Empowering the Visually Impaired Learners with Text-to-Speech-Based Solution Debabala Swain and Sony Snigdha Sahoo
Abstract Attending classes and completing education is a particularly daunting task for students with visual disability. Their challenges have been further escalated during the prevailing pandemic situation. So, mere transition to a digital platform may not suffice. Educational organizations who are moving to online mode overnight need to have concerns regarding this. Because, although online mode may help in accessing the content at one’s own pace, however, the content should be read out aloud and quite slowly and clearly for the students to pick up. An educational app, designed keeping in view the needs of a visually challenged student is thus the need of the hour. This would empower students by freeing them of dependencies. This study is aimed toward highlighting the features and limitations of any such existing apps and laying down the foundation of an app which shall have easy access facility, i.e., text-to-speech for the visually impaired learners. Also, a brief comparison among education-based apps for visually impaired has been summarized. Keywords Digital platform · Online education · Text-to-speech · Visually impaired learners (VI learners)
1 Introduction A regular curriculum is usually designed for fully sighted children and is delivered largely through sighted-related tasks. Majority of visually oriented and visually complex concepts and information in science classrooms pose significant challenges to visually impaired students. Without systematic instructional attention, these material may seem inaccessible to many students with visual impairment as their educational needs are different, which can be fulfilled only with team efforts of parents, D. Swain (B) Department of Computer Science, Rama Devi Women’s University, Bhubaneswar, India e-mail: [email protected] S. S. Sahoo Department of CSA, DDCE, Utkal University, Bhubaneswar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_55
565
566
D. Swain and S. S. Sahoo
teachers, students, and professionals. To fulfill their unique educational needs, they require specialized services, other instructional material such as audio material, Braille books, software, headphones, computer/laptops, other digital devices, etc. Colleges and other educational institutions should include Braille, audio material, software and other digital devices, etc. for visually impaired students. Having a visual impairment does not mean that the respective person cannot use a phone, in fact, smartphones have been serving as lifeline for many blind and visually impaired people. Visually challenged individuals have been able to use mobile phones just like sighted people with the help of accessibility features. Such accessibility features are available on iPhones and Android devices and are currently very popular among the blind and visual impairment community. Other models of phones have these accessibility features as well. These features enable visually impaired individuals to complete a range of tasks on their phones like keeping in contact with friends and family, reading and responding to emails, reading books, browsing the Internet, online shopping, booking train tickets and checking bus and train times, online banking, using social media, getting to where they need to be, playing games, listening to music, and so much more. But what is important is, can the same phone provide any educational assistance for students who are visually challenged. VoiceOver on Apple products such as the iPhone and TalkBack on Android devices are screenreaders that are built-into these devices. The screenreader reads out loud everything that is on the screen. VoiceOver and TalkBack work with thousands of apps meaning that we can enjoy using iPhones or Android devices, just like sighted people. From texting, to calling, scrolling through social media, managing money, online shopping and responding to important work emails, and reading out some text over Internet, blind and visually impaired people can use a mobile phone, even if they have no useful vision. Although such a huge progress has been made in this field, still it is difficult to come across any smartphone application specifically designed pertaining to any particular course for visually challenged students. This work is an effort toward providing a discussion regarding any such existing applications, comparing their features and laying down the foundation of one such novel education-based application for the visually challenged individuals. The rest of the paper is organized as follows. Section 2 summarizes few of the research work carried out in this field along with features of similar apps. Section 3 describes about the TTS approach used as the foundation of our work. Section 4 states the proposed approach. Section 5 provides a brief comparison among existing applications and proposed approach as well. Section 6 concludes this work with possible future scope.
2 Related Solutions As technology is establishing its roots deeper into the education sector, things are changing. And one such change has been introduced in the form of education-based mobile applications. A comparison summarizing how effectively webpages can be
Empowering the Visually Impaired Learners …
567
accessed by visually impaired individuals has been provided in [1, 2]. Lot of studies have been conducted, focussing on testing/developing the usability of specific devices or apps for visually impaired [2–6]. As the ultimate goal is to provide students with effective educational tools, an educational software accessibility checklist has been proposed in [7] meant teachers and students alike with no, or scarce, experience of low vision. Tetsuya et al. [8] shows that persons with visual impairments frequently use apps specifically designed for them to accomplish daily activities. Furthermore, this population is satisfied with mobile apps and would like to see further improvements and additional new such apps. There are many more instances where customized applications [9] have been developed to suit the needs of individuals with visual impairment. Few of the apps already on Play Store and being used by the visually impaired students include Remind app which is basically a communication app that allows students to send real-time messages, photos, hand outs, and other files to a class, a group, or an individual person; Photomath app that assists students in learning maths. It includes a camera calculator app that can be utilized for identifying mathematical problems. Duolingo is another app that comes with an effectual curriculum and alluring games. Some of the education-specific apps include the following.
2.1 Job Access With Speech (JAWS) [10] JAWS is a general application which is mostly a screenreader aimed at computer users whose limited/no vision inhibits their visualization of screen content or navigating with a mouse. JAWS provides output both in terms of speech and braille for a number of computer applications on a personal computer like reading documents, emails, websites, and apps navigation with mouse, scanning, and reading out documents, including pdf, filling out web forms, daisy formatted basic training which is east to use, time saver feature with skim reading and text analyzer, and surfing the net with web browsing keystrokes. The JAWS application needs to be purchased by the users’ while the proposed VI App will be an open source, which will be freely available on Google Play Store. So anyone with an Android phone may make use of the contents flexibly.
2.2 E-Learning App [11] E-Learning App is a social networking website for education which connects students, teachers, parents, and school or institution in a safe and unique network. This App is just like other prominently used social networks but does not allow to divert from the topic and help to stick on to the topic. It provides sufficient tools for students and teachers. It helps students to submit assignments; clear their doubts; post their talents (whole classroom can see); chatting with teachers and classmates; and
568
D. Swain and S. S. Sahoo
solves most of students’ problems and provides access to tools like Notepad, Paint, translator, ask me question, quiz, media, inspirational quotes, guides for subjects, Bluetooth chat, dictionary, study planner, and online quiz. If any other materials other than study materials or talents are posted, students or teachers can report the issue and that will be removed. It helps teachers to save time, keep classes organized, and improve communication with students and parents, giving students assignments, conduct classes, monitor students’ progress, and can know how students responded to teaching of particular class. E-Learning App can be practically used in classroom which replaces classroom tools into single tablet or gadget. It helps parents to keep updated with their child’s progress and communicate with teachers’ E-Learning App which, in turn, improves system of electronic learning in safe and advanced way while compared to other educational apps or networks. E-Learning App also provides School Management Material for FREE.
2.3 Lekatha [12] Lekatha is a Text-To-Speech (TTS) project which is in its infancy stage. The word lekatha does not mean anything but it is constructed from two Odia(lekha, meaning text) and (katha, meaning voice) which language words refers to constructed voice from text by using a TTS engine. Mobile learning has become very popular in the past few years owing to the advantages it has to offer. It can be accessed from anywhere in the world and anytime and it covers a huge distance, irrespective of one’s location, and a VI Learner can access the same content. Moreover online assessment helps in keeping track of one’s progress. Though there are few points on the flip side as well. The physical mobile devices can wear out after long usage. Lack of Internet connection or electricity can be a deterrent. This can specifically be a problem in rural areas where the usage of the Internet and electricity is not yet prevalent.
3 TTS Theory Text-To-Speech (TTS) is a type of assistive technology that reads digital text aloud. It is sometimes called “read aloud” technology. The voice in TTS is computergenerated, and reading speed can usually be sped up or slowed down. Voice quality varies, but some voices sound human. It is also termed as speech synthesis, where a computer or other machine reads words to you out loud in a real or simulated voice played through a speaker. With the advent of smartphone agents like Siri, Cortana, and “OK Google,” people are slowly getting used to the idea of speaking commands to a computer and getting back spoken replies. The Text-To-Speech (TTS) synthesis procedure consists of two main phases. In the first phase, raw input text is
Empowering the Visually Impaired Learners …
569
Fig. 1 A typical TTS system [13]
imported by the software for text analysis. This process is also termed as text normalization, pre-processing, or tokenization. The second phase is the Natural Language Processing (NLP) phase. It produces a phonetic transcription of the text read, together with prosody where the speech database is referred for the processing of words in a correct way. The other part of the software, i.e., the Digital Signal Processing (DSP), transforms the symbolic information it receives from NLP into audible and intelligible speech. The flow is shown in Fig. 1 [13]. The same technique has been adopted in our proposed approach, which is discussed in brief in the next section.
4 Proposed Approach The use of technology in modern era is gaining momentum because of its naturalness and ease of use. For visually impaired persons, this technology can be proven as a boon as well in accessing various technological advancements by automated reading based on TTS. In order to accomplish the same, an Android-based mobile application using Java named VI Learner is proposed. Previously, chat bot system had been used extensively by many users for accomplishing the purpose. However, in the current scenario, technology is striving to be a helping hand for the human beings as well for the society. To put forward this legacy we have proposed a VI Learner mobile app. The proposed application can be used for increasing the optimum accessibility of written content to the visually impaired. This entire app can be controlled with gesture (Fig. 2). There are three specific gestures for handling it, namely, swipe up-to-down or down-to-up to shift to next menu; (c) swipe left-to-right to select any menu in the current page as shown in Fig. 3.
570
D. Swain and S. S. Sahoo
Fig. 2 Proposed text-to-speech approach in the application
5 Test Outputs The proposed application intends to surpass the limitations posed by the previous education-based apps. Based on the above comparison it can be said that once this VI Learner app is up on Play Store, it will arm the visually impaired students with lot of educational resources. Moreover, if the app is customized according to syllabus prescribed by schools and colleges, it will be the most sought-after solution during
Empowering the Visually Impaired Learners …
571
Fig. 3 a, b Swipe up-to-down or down-to-up to shift to next menu, c Swipe left-to-right to select any menu
the prevailing pandemic situations also, not just for VI learners but for any regular student. Figure 4 demonstrates different phases of the proposed system, from the initial launching page to the questionnaire page. The launching page directs to the list of subject page. After selecting a subject page, it forwards to different chapters or topics inside. Then the learner may select one of the chapter or go for the questionnaire page. On selecting the chapter page the system starts TTS execution so the learner starts listening the topic. After completing the topic, the learner may move to questionnaire page by swiping down. The system will ask the questions one by one and records the learner options by gesture input. At the end of questionnaire, the system will read the score obtained.
Fig. 4 App snap shots a App launching page b Subject page c Individual Chapter selection d Questionnaire page
572
D. Swain and S. S. Sahoo
6 Conclusion and Future Aspects Educational applications need to incorporate speech technologies as proposed in this paper. This will increase the reach of any educational app which otherwise has little effect on improving the way of life. Moreover, the process of getting constant feedback from the students on Play Store itself can go a long way in improving the quality of the app by providing the developers with prompt feedback. The success of this process can only be measured in the context of acceptance of the educational applications by the students and their progress in solving the questionnaire that require them to memorize the content in order to answer the questions presented to them correctly. This work will be upgraded further by adding speech-to-text feature as well so that user’s voice can be taken as input command. Acknowledgements This work was supported by Odisha State Open University, Sambalpur, India under the Minor Research Project Fund at Rama Devi Women’s University.
References 1. M. Jennifer, F. Holly, T. Tu, Is your web page accessible? a comparative study of methods for assessing web page accessibility for the blind. in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, (2005), pp. 41–50 2. J.P. Bigham, A.C. Cavender, J.T. Brudvik, J.O. Wobbrock, R.E. Ladner, WebinSitu: a comparative analysis of blind and sighted browsing behavior. in ASSETS’07, (Tempe, Arizona, USA, Ocy, 2007), pp. 15–17 3. J. Wagner, G.C. Vanderheiden, M.E. Sesto, Improving the usability of a mainstream cell phone for individuals with low vision. J. Visual Impairment Blindness 100(11), 687–692 (2006) 4. B. Frey, C. Southern, M. Romero, Brailletouch: Mobile texting for the visually impaired. in Paper presented at the Sixth International Conference on Universal Access in Human-Computer Interaction, : Context Diversity, Proceedings UAHCI 2011, (Orlando, Florida, 2011), pp. 19–25. Retrieved from https:// pdfs.semanticscholar.org/2c45/8bab0614a 7b32b72e8ab7bddddbacf65b6c1.pdf 5. G. Venugopal, Android Note Manager application for people with visual impairment. Int. J. Mobile Netw. Commun. Telematics 3(5), 13–18 (2013) 6. N.K. Dim, X. Ren, Designing motion gesture interfaces in mobile phones for blind people. J. Comput. Sci. Technol. 29(5), 812–824 (2014) 7. S. Dini, L. Ferlino, A. Gettani, C. Martinoli, M. Ott, Educational software and low vision students: evaluating accessibility factors. Universal Access in the Information Society, (Springer, Berlin, Heidelberg, 2006), ISSN 1615–5289 (Print) 1615-5297 (Online) 8. W. Tetsuya, M. Manabi, M. Kazunori, N. Hideji, A Survey on the use of mobile phones by visually impaired persons in Japan. 1081–1084 (2008). https://doi.org/10.1007/978-3-54070540-6_162 9. E. Ghidini, W.D.L. Almeida, I.H. Manssour, M.S. Silveira, Developing Apps for Visually Impaired People: Lessons Learned from Practice. in 2016 49th Hawaii International Conference on System Sciences (HICSS), (Koloa, HI, 2016), pp. 5691–5700. https://doi.org/10.1109/ hicss.2016.704 10. https://www.freedomscientific.com/products/software/jaws/ 11. https://play.google.com/store/apps/details?id=appinventor.ai_mnizamudeen6.HelderTechEL earningApp&hl=en
Empowering the Visually Impaired Learners … 12. https://commons.wikimedia.org/wiki/OpenSpeaks/toolkit/Lekatha 13. https://en.wikipedia.org/wiki/File:TTS_System.svg
573
Seismic Data Analytics for Estimating Seismic Landslide Hazard Using Artificial Accelerograms Aadityan Sridharan and Sundararaman Gopalan
Abstract Accelerograms record energy released by earthquakes as accelerations. Earthquakes and their repercussions such as tsunami and landslides are a huge threat to human life. Earthquake-induced landslides are voluminous and rapid by nature. To critically evaluate the possibility of landslides due to earthquakes, accelerograms recorded at the landslide site can be used to model slope displacement. Large repositories of accelerograms that store data from earthquakes around the world are available in online data repositories and can be accessed open source. However, site-specific data for slope failures during earthquakes is rare and in most cases impractical to be obtained. To address this drawback in strong motion datasets, for the analysis of earthquake-induced landslides, artificial accelerograms that can closely represent the ground motion can be substituted. In this work, a comparison of recorded accelerograms and artificial accelerograms is presented. The ability of the artificial accelerogram to retain the ground motion parameters (GMP) is investigated by generating site-specific accelerograms. Physical and statistical models of earthquake-induced landslides are used as case study to validate the application of generated accelerograms. Results indicate that artificial accelerograms are able to retain most of the GMPs. Their application to the physical and statistical models shows promise in predicting and evaluating future hazards. Keywords Artificial accelerogram · Seismic data analytics · Earthquake-induced landslides · Ground motion parameters
A. Sridharan (B) Department of Physics, AmritaVishwa Vidyapeetham, Amritapuri, India e-mail: [email protected]; [email protected] S. Gopalan Department of Electronics and Communication Engineering, AmritaVishwa Vidyapeetham, Amritapuri, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_56
575
576
A. Sridharan and S. Gopalan
1 Introduction Earthquakes are a release of stress energy that build up in geological discontinuities known as faults in various layers of planet earth. The energy reaches the surface of our planet in seconds after the initial release of the stress. Seismologists measure this energy release in various ways such as acceleration, thermal mapping, and velocity profiles [1]. In the context of surface effects of earthquakes, accelerations play an important role as thermal mapping cannot be directly used to predict the possible damage by the event. Although velocity profiles can map the effect of ground shaking, they do not efficiently record the effects of strong ground motion [2]. Accelerations recorded by accelerometers are an important source of strong ground motion datasets and can aid in evaluating seismic landslide susceptibility [3]. Earthquake-induced landslides are among the major repercussions of a large earthquake event (Magnitude (M W ) > 4.5). Earthquake events that cause widespread landslides have at least triggered a minimum of 500–1000 landslides [4]. Data required to evaluate the risk posed by the landslide process are voluminous by nature. Various important datasets that will be required for the assessment are the recorded accelerograms, slope displacement measurements, geotechnical strength of underlying materials, and topographic variables [5]. Similarly, actual ground motion records measured exactly at the landslide area are usually unavailable as it is impractical to place an accelerometer on an unstable slope [6]. To organize the available data and to obtain reasonable estimates of slope stabilities, statistical modeling has gained prominence in predicting slope displacement from recorded accelerograms [7]. On the contrary, with modern techniques of generating artificial accelerograms, physical models can replace the statistical models to give a deterministic seismic landslide hazard assessment [8]. Artificial accelerograms that can retain the actual ground motion parameters (GMP) are a valuable asset in hazard modeling. Whether the hazard that we deal with is earthquakes or the landslides that follow them, these accelerations are the representative of seismic source characteristics [2]. From the initial models of seismic ground motion by Kanai and Tajimi to the recent models of artificial accelerogram generation, the generated accelerograms have evolved to better retain GMPs [9]. Stochastic ground motion simulation, wavelet-based spectral matching, and power spectral density function matching are a few models that are consistently referred in literature. These models are derived from large amount of recorded accelerograms [10]. Internet repositories of recorded accelerograms by various organizations are of main interest in this context. Platforms like PESMOS strong motion library by IIT Roorke, PEER strong motion data, COSMOS global repository of earthquake data, and KiKnet strong motion data repository of Japan are some of the main contributors to open-source accelerograms. Such large datasets when processed with the right tools can improve our understanding of how various GMPs contribute to seismic slope stability. A comprehensive study of comparing GMPs in artificial and recorded accelerograms and their ability to predict the slope displacements is still unaddressed
Seismic Data Analytics for Estimating Seismic Landslide …
577
[11]. Although there are very few attempts in this direction, most of them are used to assess the risk of earthquake on buildings [12]. Newmark’s rigid block algorithm has gained prominence in predicting slope stability during earthquakes [7]. Due to the applicability and predictive ability of this model, it can be adapted to verify the strong motion data that is used to predict seismic slope stability [8]. In this work, we compare the effect of artificial accelerogram and recorded accelerogram in estimating Newmark’s displacement (DN ) which is a slope displacement estimate, using two statistical models and the physical Newmark’s rigid block analysis (which is one of the prominent models to estimate seismic slope stability). The retention of important GMPs, such as peak ground acceleration (here in PGA), Arias intensity (I a ), predominant period (T D ), and mean period (T M ) of the artificial accelerogram, is presented.
2 Methodology Ground motion parameters represent various factors in a terrain that is susceptible to seismic shaking such as variation in topography, seismic wave velocity, and geotechnical strength parameters. Four of the main GMPs that are important in estimating seismic landslide susceptibility considered in this study are PGA, PGV, I a, T D and T M . PGA is the maximum acceleration that is recorded at the accelerometer site. As velocities and displacement are calculated from accelerations, PGA contributes to the maximum possible ground displacement in a region of interest, which makes it an indispensable factor for estimating the seismic landslide susceptibility [13]. Similarly, peak ground velocity (PGV) is an equally important parameter of an accelerogram that characterizes ground motion. PGV has been observed to have a strong statistical significance in relation to the Newmark’s predicted displacement [14]. I a is a quantitative intensity measure which is the integral of the square of acceleration value and is an estimate of shaking that is directly related to slope instability during earthquakes. Both TD and TM are time periods rather duration of shaking experienced in the place where the accelerometer is located. T D is the period that contains most of the spectral response that contributes to major frequency content of the earthquake. T M is a characterization parameter of the frequency content. It is a simple ratio of Fourier amplitudes and Fourier transforms [15]. The variation in values of these parameters among the recorded accelerograms in various earthquake events is shown in Table 1.
2.1 Artificial Accelerogram Generation For generating reliable artificial accelerograms, we start with generation of synthetic accelerograms by Gaussian white noise method based on earthquake characteristics such as magnitude, hypocentral distance, and source mechanism. The artificial
578
A. Sridharan and S. Gopalan
Table 1 GMPs of five earthquake events from all over the world EQ
Mw
PGA
PGV
Ia
Tm
Td
Loma Prieta
6.9
0.367
44.69354
1.34797
0.22
0.67979
Chi-Chi
7.6
0.361
21.54766
0.37522
0.49862
0.06
Fruili
6.4
0.35
22.019
0.78
0.4
0.26
Northridge
6.7
0.5683
51.68398
2.7321
0.55026
0.26
Uttarkhasi
5
0.00363
0.15966
0.00036
0.27533
0.22
accelerogram is generated by converging the synthetic accelerogram to the envelope shape (as shown in Fig. 1) and by matching the GMPs of the recorded accelerogram. There are various methods to converge artificial accelerograms as mentioned earlier. In this work, the spectral matching by the power spectral density function (PSDF) method as shown on Eq. 1, which has proved to improve the retention of GMPs is used. G(ω)n+1 = G(ω)n
SVω SV(n)(ω)
(1)
where S V is the target spectrum value and SV(n) are the computed response values. G(ω) is the PSDF computed in steps from the iterations of convergence with artificial accelerogram. PSDF represents the energy as frequencies, which can then be matched using Fourier transforms in the time domain of the synthetic accelerogram. The PSDF for the particular iteration is defined by matching the velocity target spectrum
Fig. 1 Spectral envelope of the recorded accelerogram for spectral matching. The red line is the outline of the intensity plot and blue lines show the variation in intensity of acceleration
Seismic Data Analytics for Estimating Seismic Landslide …
579
of the input real accelerogram. The Seismoartif software was used to generate the accelerograms in this study [16].
2.2 Newmark’s Rigid Block Model In 1965, in the Fifth Rankine Lecture, Newmark presented a permanent displacement model for slope failure in dams that are subjected to seismic tremor [17]. The slope that underwent failure was modeled as a rigid block on an inclined plane. Newmark’s model includes a term called yield acceleration (k y ) beyond which the rigid block starts sliding during an earthquake [18]. The formula for k y is shown in Eq. 2: k y = g sin(α)
(2)
where g is acceleration due to gravity and α is slope angle. The accelerations in the accelerogram are double integrated to obtain the displacement of the block on the inclined plane. Many regional as well as localized models of earthquake-induced landslides have been developed based on this physical model [19]. The real and artificial accelerograms for events mentioned in Table 1 are used to arrive at the Newmark’s displacement for ky values in the range 0.02 g–0.2 g [20]. An in-house Python code was used to calculate the displacement values. These values were compared with displacement values predicted by the statistical models, which are explained in the following section.
2.3 Statistical Models Used Validate GMPs For validation of the artificial accelerograms, two general statistical models developed by Saygilli et al. 2008 have been used here. This particular set of models inculcates more than 2000 individual accelerogram records from PEER, an online database. The following general equation was used to statistically predict Newmark’s displacement (DN ) [21]: 2 3 4 ln D = a1 + a2 k y /PGA + a3 k y /PGA + a4 k y /PGA + a5 k y /PGA + a6 ln(PGA) + a7 ln(GMP2) + a8 ln(GMP3) + σln D (3) In the above equation, the k y represents yield accelerations as mentioned in Sect. 2.2. For this study, values used are in the range 0.02–0.2 g, as this is the usual range of accelerations observed in earthquakes of magnitudes 4.5–8 [20]. The a’s are coefficients of the regression equation. The PGA is already included in Eq. 3, and hence there is a choice of using two more GMPs as GMP2 and GMP3. From Table 1, we use two sets of GMPs as predictors for the statistical model. First I a
580
A. Sridharan and S. Gopalan
Table 2 Coefficient of regression Eq. 2 for different combinations of GMPs GMPs I a and T M /T D PGV and I a
a2
a3
a4
a5
a6
a7
a8
σ ln D
4.27
−4.62
−21.49
46.53
−31.66
−0.5
1.14
0.86
0.252
−0.74
−4.99
−19.91
43.75
−30.12
−1.3
1.04
0.67
0.265
a1
and T M or T D , second case PGV and I a . In both cases, the values of a’s and the coefficients of regression are given in Table 2. The values of predicted displacement for both the models were computed using real and artificial accelerograms. The variation in the displacement and the GMPs has been shown as plots in the following section.
3 Results and Discussion The correlation among the GMPs of the artificial and recorded accelerograms is shown in Fig. 2. The PGA, PGV, and T M correlate linearly while the T D values do
Fig. 2 The correlation among the four GMPs in artificial and real accelerogram records
Seismic Data Analytics for Estimating Seismic Landslide …
581
Fig. 3 Estimated displacement of both the statistical models in comparison with physical model. a Artificial-accelerogram-based displacement. b Real-accelerogram-based displacement
not. One of the reasons for this might be matching the frequency content by the PSDF. This might not fully account for the spectral response which is represented by T D . However, the frequency content is represented by accelerations, i.e., denser the accelerogram more the frequency, this is denoted by the target velocity spectrum. Hence, the artificial accelerograms are able to retain majority of the GMPs. The values of predicted displacement in statistical models have been compared in Fig. 3. It can be seen that for real accelerograms (Fig. 3b), the predicted displacements by the statistical models are larger than the displacements obtained from the physical model. The values of displacement near 150 cm in physical model correspond to 190 cm in the statistical model 1. In model 2, the values are underestimated, 150 cm in physical model corresponds to values 120 cm. Reason for this observation apart from the characteristics of the records themselves can be due to the standard deviation of the statistical model [22]. The records themselves are usually pre-processed for seismic and instrumental noise before being available in the repositories. The values of acceleration obtained from field might also correspond to the geology under the accelerometers, and this can also cause fluctuations in the obtained displacements [20]. Although DN has been proven to be a good estimate of slope displacement, rigid block on an inclined plane is still an approximation of an actual landslide [14]. For points of low displacement, the estimation of both the statistical models seems to be linearly correlated. The values of displacement obtained from artificial accelerogram (Fig. 3a) show better correlation among high as well as low values. It can be noted from Fig. 3, both the A and B parts show overestimation for middle range of displacements. The range of displacement from 60 to 140 cm denotes values that correspond to extreme cases of slope failures. Areas in which the values of DN increase beyond 50 cm, widespread slope failures are possible [20]. These middle values though overestimated do not affect the physical significance of landslides distribution in the terrain, i.e., it means that when DN > 60 cm slope failures are eminent. Hence, both the real accelerograms and the artificial accelerograms
582
A. Sridharan and S. Gopalan
are estimating similar displacements in the middle range which is a good similarity among the two records. Based on the above observations, artificial accelerograms are able to arrive at a good estimate of strong ground motion, displacement and retain the GMPs. For modeling seismic slope failures, artificial accelerograms can be substituted in dearth of actual records [23]. Although there are models that can make a better approximation of the actual slope displacements during an earthquake, the rigid block model remains to be one of the successful algorithms in modeling regional slope failures [24]. Future work will involve critical review of more accurate methodologies to generate artificial accelerograms that can aid in better hazard assessment.
4 Conclusion This work presents a comprehensive analysis of generating artificial accelerograms and their application in the estimation of seismic landslide hazard. The study underlines the importance of data analytics and data validation in hazard assessment. The hazard estimated by validated accelerogram records can help in better prediction and assessment of disasters. Acknowledgements The authors would like to show immense gratitude to the Chancellor of Amrita University, and Mata Amritanandamayi Devi for her constant guidance and for being the inspiration behind this work.
References 1. Y. Ren, J. Ma, P. Liu, S. Chen, Experimental study of thermal field evolution in the shortimpending stage before earthquakes. Pure. Appl. Geophys. 175, 2527–2539 (2018). https:// doi.org/10.1007/s00024-017-1626-7 2. C. Medel-Vera, T. Ji, A stochastic ground motion accelerogram model for Northwest Europe. Soil Dyn. Earthq. Eng. 82, 170–195 (2016). https://doi.org/10.1016/j.soildyn.2015.12.012 3. A. Khansefid, A. Bakhshi, A. Ansari, Development of declustered processed earthquake accelerogram database for the Iranian plateau: including near-field record categorization. J. Seismol. 23, 869–888 (2019). https://doi.org/10.1007/s10950-019-09839-w 4. C. Tang, C.J. Van Westen, H. Tanyas, V.G. Jetten, Analysing post-earthquake landslide activity using multi-temporal landslide inventories near the epicentral area of the 2008 Wenchuan earthquake. Nat. Hazards Earth Syst. Sci. 16, 2641–2655 (2016). https://doi.org/10.5194/nhess16-2641-2016 5. F. Wang, X. Fan, A.P. Yunus, S. Siva Subramanian, A. Alonso-Rodriguez, L. Dai, Q. Xu, R. Huang, Coseismic landslides triggered by the 2018 Hokkaido, Japan (Mw 6.6), earthquake: spatial distribution, controlling factors, and possible failure mechanism. Landslides 16, 1551– 1566 (2019). https://doi.org/10.1007/s10346-019-01187-7 6. V.M. Ramesh, N. Vasudevan, The deployment of deep-earth sensor probes for landslide detection. Landslides 9, 457–474 (2012). https://doi.org/10.1007/s10346-011-0300-x
Seismic Data Analytics for Estimating Seismic Landslide …
583
7. R.W. Jibson, H. Tanya¸s, The influence of frequency and duration of seismic ground motion on the size of triggered landslides—a regional view. Eng. Geol. 273, 105671 (2020). https://doi. org/10.1016/j.enggeo.2020.105671 8. J. Song, Q. Fan, T. Feng, Z. Chen, J. Chen, Y. Gao, A multi-block sliding approach to calculate the permanent seismic displacement of slopes. Eng. Geol. 255, 48–58 (2019). https://doi.org/ 10.1016/j.enggeo.2019.04.012 9. R. Ramkrishnan, K. Sreevalsa, T.G. Sitharam, Development of new ground motion prediction equation for the north and central himalayas using recorded strong motion data. J. Earthq. Eng. 0, 1–24 (2019). https://doi.org/10.1080/13632469.2019.1605318 10. H. Chaulagain, H. Rodrigues, H. Varum, V. Silva, D. Gautam, Generation of spectrumcompatible acceleration time history for Nepal. Comptes. Rendus. Geosci. 349, 198–201 (2017). https://doi.org/10.1016/j.crte.2017.07.001 11. X. Fan, G. Scaringi, O. Korup, A.J. West, C.J. Westen, H. Tanyas, N. Hovius, T.C. Hales, R.W. Jibson, K.E. Allstadt, L. Zhang, S.G. Evans, C. Xu, G. Li, X. Pei, Q. Xu, R. Huang, Earthquakeinduced chains of geologic hazards: patterns, mechanisms, and impacts. Rev. Geophys. 57, 421–503 (2019). https://doi.org/10.1029/2018RG000626 12. H. Zafarani, Y. Jafarian, A. Eskandarinejad, A. Lashgari, M.R. Soghrat, H. Sharafi, M. Afraz-e Haji-Saraei, Seismic hazard analysis and local site effect of the 2017 Mw 7.3 Sarpol-e Zahab, Iran, earthquake. Nat. Hazards. (2020). https://doi.org/10.1007/s11069-020-04054-0 13. T.G. Sitharam, S. Kolathayar, N. James, Probabilistic assessment of surface level seismic hazard in India using topographic gradient as a proxy for site condition. Geosci. Front. 6, 847–859 (2015). https://doi.org/10.1016/j.gsf.2014.06.002 14. R.W. Jibson, Methods for assessing the stability of slopes during earthquakes-aa retrospective. Eng. Geol. 122, 43–50 (2011). https://doi.org/10.1016/j.enggeo.2010.09.017 15. A. Seismoartif, A.A. Simulations, F. Seismoartif, Artificial accelerograms generation synthetic accelerogram generation and adjustment random set of phase angles with amplitudes calculated by power density function [Gasparini and Vanmarcke, 1976] (2019), 1–13 16. Seismosoft: Seismoapps Technical Information Sheet., Piazza Castello, 19 27100 Pavia (PV)Italy (2018) 17. N.M. Newmark, Effects of earthquakes on dams and embankments. Géotechnique 15, 139–160 (1965). https://doi.org/10.1680/geot.1965.15.2.139 18. S.-Y. Hsieh, C.-T. Lee, Empirical estimation of the newmark displacement from the Arias intensity and critical acceleration. Eng. Geol. 122, 34–42 (2011). https://doi.org/10.1016/j.eng geo.2010.12.006 19. S. Ma, C. Xu, Assessment of co-seismic landslide hazard using the newmark model and statistical analyses: a case study of the 2013 Lushan, China, Mw6.6 earthquake. Nat. Hazards. 96, 389–412 (2019). https://doi.org/10.1007/s11069-018-3548-9 20. R.W. Jibson, E.L. Harp, J.A. Michael, A method for producing digital probabilistic seismic landslide hazard maps: an example from the Los Angeles, California, USA. Eng. Geol. 58, 271–289 (2000). https://doi.org/10.4323/rjlm.2014.63 21. G. Saygili, E.M. Rathje, Empirical predictive models for earthquake-induced sliding displacements of slopes. J. Geotech. Geoenviron. Eng. 134, 790–803 (2008). https://doi.org/10.1061/ (ASCE)1090-0241(2008)134:6(790) 22. B. Tiwari, B. Ajmera, S. Dhital, Characteristics of moderate- to large-scale landslides triggered by the M w 7.8 2015 Gorkha earthquake and its aftershocks. Landslides 14, 1297–1318 (2017). https://doi.org/10.1007/s10346-016-0789-0 23. A. Joshi, P. Kumari, S. Singh, M.L. Sharma, Near-field and far-field simulation of accelerograms of Sikkim earthquake of September 18, 2011 using modified semi-empirical approach. Nat. Hazards 64, 1029–1054 (2012). https://doi.org/10.1007/s11069-012-0281-7 24. A. Sridharan, S. Gopalan, Prediction studies of landslides in the mangan and singtam areas triggered by 2011 Sikkim earthquake. in Communications in Computer and Information Science (2019), pp. 609–617. https://doi.org/10.1007/978-981-13-9942-8_57
Impact of Presence of Obstacles in Terrain on Performance of Some Reactive Protocols in MANET Banoj Kumar Panda, Prasant Kumar Pattnaik, and Urmila Bhanja
Abstract In current era due to the fast growth in telecommunication technologies, there is a drastic expansion in the number of subscribers using communication network. In mobile ad hoc network, mobility of nodes and obstacles present in terrain affect a lot the efficiency of the mobile network. When there is change in mobility and presence of obstacles in terrain, link between neighbour nodes breaks very often as a result of which the network performance degrades. Many authors have investigated the performance degradation of network due to the change in mobility taking various reactive protocols but they have considered presence of obstacles. This paper gives a thorough investigation of the performance degradation due to the variation in mobility of nodes in presence of irregular-sized obstacles in terrain at different traffic conditions. The performance of two popular reactive routing protocols AODV and DSR is compared on basis of different network parameters. Keywords normalised routing overhead · Packet delivery ratio · Average delay · MANET and obstacles
1 Introduction Mobile Ad Hoc Network (MANET) [1] is an infrastructure-less communication mobile network created by the help of the set of wireless transceiver devices having
B. K. Panda Department of Electronics and Telecommunication Engineering, Utkal University, Bhubaneswar, India e-mail: [email protected] P. K. Pattnaik (B) School of Computer Engineering, KIIT Deemed to Be University, Bhubaneswar, India e-mail: [email protected] U. Bhanja Departments of Electronics and Telecommunication Engineering, IGIT, Sarang, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2_57
585
586
B. K. Panda et al.
mobility. Here each and every node behaves as a source node, sink node, or an intermediate node. In MANET, nodes move freely anywhere randomly, and hence the network topographic anatomy varies continuously. In network mobility of routing, nodes perform a key role on efficiency of routing protocols. Many authors [2, 3] have done detailed analysis of performance of MANET in varying mobility scenario. Some of them [4] have suggested some new techniques to reduce energy consumption in the network. Some of them [5] have suggested the optimum node density to get better performance. But all of them have studied the effect of mobility without considering obstacles in network terrain. In network terrain, obstacles like buildings, rivers, mountains and walls are present which restricts the communication between nodes. Hence, during performance evolution, presence of obstacles should be considered during network simulation. In this paper, we have taken seven number of irregular-sized polygonal obstacles in the terrain area during simulation. Here with varying mobility the performance parameters like packet delivery ratio, average delay, average energy consumption and normalised routing overhead of AODV and DSR are compared with the existence of obstacles in network at different network traffics.
2 Literature Review In paper [6], authors have proposed a new protocol with a more effective route search procedure which avoids the presence of congestion in the path. The protocol chooses route based on network traffic on the mobile node and rearranges route as the network topology alters. It does not transmit whole data using a single route, new more efficient routes are searched infrequently during data transmission. It is a more efficient method for data transmissions that involves a link for long time period. However, this paper is not analysed by considering obstacle in the terrain during simulation. In paper [7], authors have compared the performance of routing MANET protocols by changing the mobility of mobile node in terms of pause time. They have taken different traffic conditions during their simulation. However, all investigation is done without considering obstacles in network terrain. This paper [8] investigates the problem from a different angle by means of simulating the collective effect of mobile node density and data length and node’s mobility on the efficiency of MANET. In a realistic situation, MANET’s mobile nodes connect and depart the network frequently. Depending on the quality of service (data delay, throughput), routing packet load and data packet retransmissions, the paper compares the efficiency of MANET routing techniques based on QoS. Here all analyses are done taking plain terrain. In paper [9], the authors discuss thoroughly the output efficiency of few popular protocols AODV and DSR in changing mobility condition. Paper explains a thorough examination of network efficiency affected due to dynamic condition. Here parameters like quantity of node-link breaks, amount of data packet received at sink
Impact of Presence of Obstacles in Terrain on Performance …
587
and end data delays are used for performance analysis. However, network simulation is done without considering obstacles in MANET. This paper [10] proposes an Obstacle-Avoiding Connectivity Restoration Strategy (OCRS) technique utilising a fully exploring node mobility method without considering any incoming convex-type obstacle environment. Thus, it uses a backup selection technique. However, it does not compare any reactive routing algorithms in presence of protocol. Paper [11] proposes a new technique called Ad Hoc On-demand Multipath Distance Vector (AOMDV) reactive routing procedure which reduces the energy consumption in MANET utilising a fitness parameter and required multipath distance. The ad hoc on-demand multipath distance vector-fitness function, i.e. AOMDV-FF reactive routing procedure selects lowest length route that expenses least energy. Here two techniques are projected in present paper, AOMDV and AOMDV-FF, which are compared taking few network parameters like average energy consumption, lifetime of network and network routing load on basis of rate of data delivery, size of data packet, total simulation span, etc. The outcomes indicate an improved performance in comparison to AOMDV and AOMR-LM techniques.
3 Simulation Environment and Performance Metrics 3.1 Simulation Environment Performance analysis of MANET is done using NS-2 as the network simulating software. Here, AODV and DSR are used as routing protocols. Random Way Point model is utilised for node movement. Other parameters utilised for network simulation are given in Table 1. The network simulation has run for span of 500 s. Each individual simulation is taken to a single seed. Five different seeds of simulation have been done for given set of parameters. To plot a point on the graph, the average of five seeds is calculated.
3.2 Performance Metrics Following parameters are selected for studying the outcomes of the MANET using two reactive routing techniques: Packet Delivery Ratio (PDR): PDR is defined as the ratio of the number of total data packets received at all destination nodes to the total number of packets sent by all the source nodes. Average delay: Average delay is defined as the ratio of total time by all data packets to reach at all destinations to the total number of nodes.
588 Table 1 Parameters used for simulation
B. K. Panda et al. Parameters
Value/specification
Terrain size
1500 × 300 M
Number of mobile nodes
50
Mobility model used
Random waypoint
Number of sources
515 and 25
Maximum node speed
20 M/S
Pause time
0–500 s
Simulation time
500 s
Mac protocol used
802.11
Routing protocol
AODV, DSR
Transmission range
250 M
Data packet size
512 bytes
Data rate
2 Mbps
Type of data traffic
CBR (constant bit rate)
Average energy consumption: Average energy consumption is defined as the total energy consumed by all the nodes to the total number of nodes. Normalised Routing Overload (NRO) is defined as the ratio of total control packets transmitted by all network nodes to the total data packets received at all the sink nodes.
4 Performance Analysis 4.1 Low Traffic Density Analysis of variations of performance parameters in low traffic condition is shown below. Low traffic density network is simulated by taking five sources and five destinations. The parameter value of AODV is compared with DSR both for plain terrain and with obstacles. Figure 1 explains the variation of Packet Data Delivery Ratio (PDR) with respect to pause time. From output data, it is concluded that when pause time increases (i.e. mobility decreases) PDR also increases. It is observed from the plot that in plain terrain the performance DSR is 1.1% better than AODV in higher mobility situation. This may be due to, at higher mobility condition, more amount of links breaks for both the protocols, but in case of DSR protocol, alternate routes are available so data delivery to the destination is better. The PDR of both protocols is equal for low mobility condition. In presence of obstacles, the PDR of both protocols decreases about 11.88% less in comparison to plain terrain because of link breaks due to obstacles in the terrain. As mobility changes, similar trend is observed for both the protocols in presence of obstacles. It is studied from Fig. 2 that the average delay of the
Impact of Presence of Obstacles in Terrain on Performance …
589
Fig. 1 PDR for five sources
Fig. 2 Average delay for five sources
network decreases with raise in the pause time. This figure shows the average delay of DSR in plain terrain is higher than AODV in high mobility scenario. This may be due to use of stale route for sending data packets to sink node. As mobility decreases, link break also decreases, and hence average delay also decreases. In presence of obstacles, the average delay increases about 12.74% more in comparison to plain terrain for both the protocols because of link break. As mobility changes, similar trend is observed for both the protocols in presence of obstacles. Figure 3 shows the variation of Average Energy Consumption (AEC) with respect to mobility. It is clear from the figure that AEC of network nodes decreases as mobility decreases in plain terrain. This is due to decrease in link break. When link break decreases, new route search also decreases so AEC also decreases. Figure shows AEC of AODV is more than DSR. This is due to new route search in AODV is
590
B. K. Panda et al.
Fig. 3 AEC for five sources
higher in comparison to DSR because no alternative route is available. In presence of obstacles, link break increases so new route search process increases. Here, in presence of obstacles, AEC increases about 68.69% than plain terrain. As mobility changes, similar trend is observed for both the protocols in presence of obstacles. It is noticed from Fig. 4 that Normalised Routing Overhead (NRO) of the network reduces with increases in the pause time. At high mobility condition, the NRO of AODV is higher than the DSR. This is because the DSR initiates less number of route search process initiated as compared to AODV. In DSR, alternate routes are available to route data to destination. As mobility decreases, NRO also decreases for both protocols due to decrease in link break. In presence of obstacles, the link break further increases, so NRO of both the protocols increases 104% more in comparison Fig. 4 NRO for five sources
Impact of Presence of Obstacles in Terrain on Performance …
591
to plain terrain. As mobility changes, similar trend is observed for both the protocols in p resence of obstacles.
4.2 Medium Traffic Density Analysis of variations of performance parameters in medium traffic condition is shown below. Low traffic density network is simulated by taking 15 sources and 15 destinations. The parameter value of AODV is compared with DSR both for plain terrain and with obstacles. Figure 5 indicates the variation of Packet Data Delivery Ratio (PDR) with respect to pause time. From output data, it is concluded that when pause time increases (i.e. mobility decreases) PDR also increases. It observed from the plot that in plain terrain the performance of AODV is 1.59% better than DSR at low pause time. This is because, at low pause time, more amount of links breaks for DSR protocols at medium traffic condition due to network congestion. AODV searches new route faster than DSR, so its PDR is better. As mobility decreases, link break due to mobility also decreases, and hence PDR of both the protocols increases. In presence of obstacles, link break further increases at high mobility condition. In presence of obstacles, the PDR of both protocols decreases about 14.22% less in comparison to plain terrain. This is because of link breaks due to obstacles in the terrain. As mobility changes, similar trend is observed for both the protocols in presence of obstacles. It is noticed from given Fig. 6 that the average delay of the network reduces with increase in the pause time at medium traffic. Figure shows the average delay of DSR in plain terrain is higher than the AODV in high mobility scenario. This may be due to the use of stale route for sending data to sink node. As mobility decreases, link break also decreases, and hence average delay also decreases. In presence of Fig. 5 PDR for 15 sources
592
B. K. Panda et al.
Fig. 6 Average delay for 15 sources
obstacles, the average delay increases about 13.28% more in comparison to plain terrain for both the protocols. This is because of increase in link break. As mobility changes, similar trend is observed for both the protocols in presence of obstacles. Figure 7 indicates the change of the Average Energy Consumption (AEC) with regard to mobility at medium traffic. As compared to low traffic, AEC of both the protocols is more in medium traffic condition. This may be due to raise in the link break as network traffic increases. It is clear from the figure that AEC of network nodes decreases as mobility decreases in plain terrain. This is due to decrease in link break. When link break decreases, new route search also decreases so AEC also decreases. Figure shows AEC of AODV is more than DSR. This is due to new route search in AODV is higher in comparison to the DSR because no alternative route is available. In presence of obstacles, link break increases so new route search Fig. 7 AEC for 15 sources
Impact of Presence of Obstacles in Terrain on Performance …
593
Fig. 8 NRO for 15 sources
process increases. Here, AEC increases about 56.76% than plain terrain. As mobility changes, similar trend is observed for both the protocols in presence of obstacles. It is noticed from Fig. 8 that Normalised Routing Overhead (NRO) of the network reduces’ with increase in the pause time. Compared to low traffic, NROs of both the protocols are more in medium traffic condition. This may be due to increase in link break and route search due to increase in network traffic. At high mobility condition, the NRO of the AODV is higher than the DSR. In case of DSR, less number of route search process is initiated as compared to AODV because alternate routes are available to route data to the destination. As mobility decreases, NRO also decreases for both protocols due to drop in link break. With the presence of obstacles, the link break further increases so NRO of both the protocols also increases 99.1% more in comparison to plain terrain. As mobility changes, similar trend is observed for both the protocols in presence of obstacles.
4.3 High Traffic Density Analysis of variations of performance parameters in low traffic condition is done below. High traffic density is simulated by taking 25 sources and 25 destinations. The parameter value of AODV is compared with DSR both for plain terrain and with obstacles Figure 9 indicates the variation of Packet Data Delivery Ratio (PDR) with respect to pause time. As compared to medium traffic, PDRs of both the protocols are less in high traffic condition. This may due to increase in link break in high network traffic. From output data, it is concluded that when pause time increases (i.e. mobility decreases) PDR also increases. It observed from the plot that in plain terrain the performance of AODV is 3.7% better than DSR in high mobility scenario. This is
594
B. K. Panda et al.
Fig. 9 PDR for 25 sources
because, at low pause time, more amount of links breaks for DSR protocols at high traffic condition due to network congestion. AODV searches new route faster than DSR so its PDR is better. As mobility decreases, link break due to mobility also decreases, and hence PDR of both the protocols increases. In presence of obstacles, link break further increases at high mobility condition. In presence of obstacles, the PDR of both protocols decreases about 14.77% less in comparison to plain terrain. This is because of link breaks due to obstacles in the terrain. As mobility changes, similar trend is observed for both the protocols in presence of obstacles. It is noticed from Fig. 10 that the average delay of the network reduces with increase in pause time at medium traffic. As compared to medium traffic, average delay of both the protocols is more in high traffic condition. This may be due to increase in link break in high network traffic. Figure shows the average delay of Fig. 10 Average delay for 25 sources
Impact of Presence of Obstacles in Terrain on Performance …
595
Fig. 11 AEC for 25 sources
DSR in plain terrain is more than AODV in high mobility condition. This may be due to use of stale route to send data packets to destination. As mobility decreases, link break also decreases, and hence average delay also decreases. In presence of obstacles, the average delay increases about 13.66% for both the protocols more in comparison to plain terrain. This is because of increase in link break. As mobility changes, similar trend is observed for both the protocols in presence of obstacles. Figure 11 indicates the change of the Average Energy Consumption (AEC) with regard to mobility at medium traffic. As compared to medium traffic, AEC of both the protocols is more in high traffic condition. This may be due to rise in the link break as network traffic increases. This is clear from the figure that AEC of network nodes decreases as mobility decreases in plain terrain. This is due to decrease in link break. When link break decreases, new route search also decreases so AEC also decreases. Figure shows AEC of AODV is more than DSR. This is due to the fact that new route search in AODV is higher in comparison to the DSR because no alternative route is available. In presence of obstacles, link break increases so new route search process increases. Hence, AEC increases about 44.5% than plain terrain. As mobility changes, similar trend is observed for both the protocols in presence of obstacles. It is noticed from Fig. 12 that Normalised Routing Overhead (NRO) of the network reduces with increase in pause time. In comparison to medium traffic, NROs of both the protocols are more in high traffic condition. This may be due to increase in link break and route search due to increase in network traffic. At high mobility condition, the NRO of the AODV is higher than the DSR. In case of DSR, less number of route search process is initiated as compared to AODV because alternate routes are available to route data to the destination. As mobility decreases, NRO also decreases for both protocols due to drop in link break. With the presence of obstacles the link break further increases so NRO of both the protocols also increases 53.08% more in comparison to plain terrain. As mobility changes, similar trend is observed for both the protocols in presence of obstacles.
596
B. K. Panda et al.
Fig. 12 NRO for 25 sources
5 Conclusion and Future Scope of Work The performance comparison of both reactive protocols shows that in plain terrain PDR and average delay of AODV are considerably superior to DSR in medium and high traffic scenario. As NRO of AODV is better in comparison to the DSR, the AEC of the AODV is also better in comparison to DSR. In presence of obstacles, the performance of both the protocols degrades due to increase in link break due to the presence of obstacles. In medium and high traffic conditions, the performance of the AODV is superior to DSR based on PDR and average delay in presence of obstacles. Now, more research is required to be done to increase PDR and to decrease AEC, NOR of AODV in presence of obstacles.
References 1. C.E. Perkins, in Ad-hoc Networking. Text book Addison-wesley (2001) 2. J. Broch, D.A. Maltz, D.B. Johnson, Y.-C. Hu, J. Jetcheva, A performance compassion of multihop wireless ad-hoc network routing protocols. MOBICOM’98, 85–97 (1998) 3. C.E Perkins, E.M. Royer, S.R. Das, M.K. Marina, Performance comparison of two on demand routing protocols for ad-hoc networks. IEEE Personal Commun. 16–28 (2001) 4. A. Taha, R. Alsaqour, M. Uddin, M. Abdelhaq, T. Saba, Energy efficient multipath routing protocol for mobile ad-hoc network using the fitness function. IEEE Access 5, 10369–10381 (2017) 5. E.M. Royer, P.M. Melliar-Smith, L.E Moser, An analysis of the optimum node density for ad hoc mobile networks. in Proceedings ICC-2001, vol. 3 (2001), pp. 857–861 6. I. Ahamad, M. Rahman, Efficient AODV routing based on traffic load and mobility of node in MANET. in: ICET (2010), pp. 370–375 7. V. Arora, C. Ramakrishna, Performance evaluation of routing protocols for MANET under different traffic condition. in ICCET, vol. 6 (2012) pp. 79–84.
Impact of Presence of Obstacles in Terrain on Performance …
597
8. N.I. Sarkar, W.G. Lol, Study of MANET routing protocols: joint node desity, packet length and mobility (IEEE, 2010), pp. 515–520 9. X. Hu, J.K.Wong, C.R. Wong, C. Wong, Is mobility is always harmful to routing protocol performance of MANETs. in ICCEDCKD (2010), pp. 108–112 10. Z. Mi, Y. Yang, J.Y. Yang, Restoring connectivity of mobile robotic sensor networks while avoiding obstacles. IEEE Sens. J. 15(8), 4640–4650 (2015) 11. K. Amjad, A.J. Stocker, Impact of node density and mobility on performance of AODV and DSR in MANET. in CSNDSP (2010), pp. 61–65
Author Index
A Abhichandan, Deepesh, 91 Agarwal, Tanushree, 351 Aghor, Dhanashri, 127 Agrawal, Raunak, 69 Aluvalu, Rajanikanth, 309 Amir Khusru Akhtar, Md., 207 Anand, Divsehaj Singh, 281 Asabe, Sujata H., 91
B Balabantaray, Bunil Kumar, 521 Bansal, Himani, 215 Bansal, Raksha, 161 Bhanja, Urmila, 585 Bhat, Nishant, 1 Bhor, Siddhi, 161 Bhosale, Gayatri, 365 Bilgaiyan, Saurabh, 351, 373, 383
C Chakravarty, Krishna, 255 Chakravarty, Sujata, 545 Chandak, Ayush, 69 Chandak, Naman, 69 Chatterjee, Priyadarshini, 491 Chendake, Pooja, 161 Chennam, Krishna Keerthi, 309
D Das, Bandita, 521 Das, Madhusmita, 513
Dass, Sanchit, 13 Deshmukh, Rutvik, 117 Deshpande, Vivek, 21 Dewangan, Prachee, 545 Dube, Mahesh R., 501 Dubey, Bhartendu, 215 Dumpala, Sanjana, 13
G Gaikwad, Tanvi, 365 Gandewar, Sadanand, 297 Gautam, Pratima, 197 Gawai, Abhijit, 1 Gawas, Abhijeet, 45 Ghadekar, Premanand, 13, 33, 103, 161 Gladence, L. Mary, 135 Gopalan, Sundararaman, 575 Goyanka, Parikha, 215 Gupta, Aryan Kumar, 281 Gupta, Ayush, 445 Gupta, P. K., 533
H Hansda, Raimoni, 521 Hanuman, Vijay, 457
J Jadhav, Ranjana, 281 Jagadeeswari, T., 491 Jain, Gourav, 91 Jevrani, Anuj, 13 Jha, Ayush, 351
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 D. Swain et al. (eds.), Machine Learning and Information Processing, Advances in Intelligent Systems and Computing 1311, https://doi.org/10.1007/978-981-33-4859-2
599
600 Joshi, Chaitanya, 45 Joshi, Deepali J., 117, 143, 297 Joshi, Shaunak, 33, 103 Joshi, Vinit, 91
K Kadam, Shailesh, 117 Kale, Ishaan, 297 Kankurti, Tasmiya, 117 Karwande, Atharva, 403 Kaur, Rajsonal, 373 Khandelwal, Aneesh, 45 Khare, Shreyas, 281 Kiran Dash, Kajal, 415 Kohli, Narendra, 331 Kokate, Yogini, 33, 103 Kolhe, Tejas, 403 Koppula, Vijaya Kumar, 343 Korate, Omkar, 297 Korpal, Pragya, 161 Kude, Harshada, 33, 103 Kulawade, Sachin, 83 Kulkarni, Ajinkya, 143 Kulkarni, Anirudha, 59 Kulkarni, Ishwari, 143 Kulkarni, Milind, 241 Kulkarni, Nachiket K., 21 Kulkarni, Pooja, 403 Kulkarni, Pranesh, 403 Kulkarni, Purva, 267 Kumar, Ashwani, 207, 435 Kumar Mohanta, Bhabendu, 415 Kumar, Randhir, 225
L Lade, Sangita, 241, 267, 365 Lakshmi, Nagubandi Naga, 189 Lapshetwar, Vedant, 69
M Mahat, Maheep, 319 Manjare, Soham, 83 Mantri, Shraddha, 59 Marathe, Pradyumna, 403 Marchang, Ningrinla, 225 Mishra, Awanish Kumar, 331 Modak, Avani, 445 Mohanty, Archit, 383 Mohanty, Suneeta, 469 Mole, Prajakta, 127 Mondal, Jayanta, 513
Author Index N Nanda, Surendra Kumar, 469 Narla, Swapna, 343 Nartam, Purva, 267 Navghare, Nilesh D., 135 Nayak, Biswojit, 415 Nikita, 153
P Padalkar, Akshada, 117 Paikaray, Bijay Ku., 501, 545 Panda, Banoj Kumar, 585 Pande, Riya, 143 Pandey, Himanshu, 1 Patil, Aniket, 241, 267 Patil, Aseem, 479 Patil, Shivkumar, 297 Patil, Siddharth, 143 Pattnaik, Prasant Kumar, 469, 585 Patwari, Divya, 297 Pawar, Apurva, 161 Pawar, Bhargav, 1 Pawar, Shital, 127 Phadke, Sayali, 59 Phadtare, Shweta, 127 Pinnamaneni, Krishna Vamsi, 457 Prashanth, S. K., 391 Priyanka, V., 425
R Rajwal, Tanay, 21 Raman, D., 391, 435 Reddy, G. Vijendar, 435 Roy, Prateek, 45
S Sahoo, Giridhari, 171 Sahoo, Sony Snigdha, 171, 181, 565 Sahu, Devraj, 373 Saini, Nikhil, 143 Saraf, Prasad, 267 Sharma, Dheeraj, 281 Sharma, Tanuj, 83 Shinde, Nikhil, 13 Shukla, Vishal, 521 Singh, Adarsh Kumar, 373 Singh, Jagannath, 255 Singh, J. N., 153 Singh, Muskan, 153 Singh, Shweta, 153
Author Index Singh, Tripty, 457 Sinha, Arvind Kumar, 207 Sonavane, Aishwarya, 365 Sridharan, Aadityan, 575 Suri, Ajay, 153 SuryaNarayana, G., 343 Suryawanshi, Ashish, 91 Swain, Debabala, 181, 521, 545, 555, 565 Swain, Debabrata, 1, 45, 59, 69, 83, 501 Swain, Monalisa, 555
T Tandon, Righa, 533 Tapadiya, Prachi, 281 Thakur, Amit, 21 Tilokchandani, Mohit, 13 Tripathi, Rakesh, 225
U Ujjainia, Shikha, 197 Uma Maheswari, V., 309, 425
601 V Vadtile, Pranali, 127 Varshney, Shreyansh, 215 Vartak, Tanay, 117 Vaswani, Ashish, 69 Veenadhari, S., 197 Velpuru, Muni Sekhar, 491 Venkatesh, M., 351 Verma, Monika, 59 Vijendar Reddy, G., 189 Vijeta, 83 Vuyyala, Sathish, 189, 435
W Wyawahare, Medha, 403
Y Yadav, Dileep Kumar, 153
Z Zad, Vishwesh, 45 Zainab, Shaik Arshia, 189