302 21 18MB
English Pages 631 [602] Year 2022
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar
A. Brahmananda Reddy B. V. Kiranmayee Raghava Rao Mukkamala K. Srujan Raju Editors
Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems ICACECS 2021
Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/16171
A. Brahmananda Reddy · B. V. Kiranmayee · Raghava Rao Mukkamala · K. Srujan Raju Editors
Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems ICACECS 2021
Editors A. Brahmananda Reddy Department of Computer Science and Engineering Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology Hyderabad, Telangana, India Raghava Rao Mukkamala Centre for Business Data Analytics Copenhagen Business School Frederiksberg, Denmark
B. V. Kiranmayee Department of Computer Science and Engineering Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology Hyderabad, Telangana, India K. Srujan Raju Department of Computer Science and Engineering CMR Technical Campus Hyderabad, Telangana, India
ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-16-7388-7 ISBN 978-981-16-7389-4 (eBook) https://doi.org/10.1007/978-981-16-7389-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Team ICACECS-2021
Chief Patrons Dr. D. N. Rao, President, Vignana Jyothi Society Sri. K. Harichandra Prasad, General Secretary, Vignana Jyothi Society
Patrons Dr. C. D. Naidu, Principal, VNR VJIET Dr. B. Chennakesava Rao, Director-Advancement, VNR VJIET Dr. A. Subhananda Rao, Director-Research and Development, VNR VJIET Dr. B. V. Kiranmayee, HOD-CSE, VNR VJIET, Hyderabad, India
Program Chairs Dr. A. Brahmananda Reddy, VNR VJIET, Hyderabad, India Dr. B. V. Kiranmayee, VNR VJIET, Hyderabad, India
Program Co-chair Dr. P. Subhash, VNR VJIET, Hyderabad, India
v
vi
Team ICACECS-2021
General Chairs Dr. Raghava Rao Mukkamala, CBDA, CBS, Denmark Gautam Mahapatra, President, Computer Society of India (CSI), India Dr. C. Kiran Mai, Professor in Department of CSE, VNR VJIET, Hyderabad, India
Honorary Chairs Dr. A. Govardhan, Professor in CSE and Rector JNTUH, Hyderabad, India Dr. Suresh Chandra Satapathy, Professor, KIIT University, Bhubaneshwar, India
Editorial Board Dr. A. Brahmananda Reddy, VNR VJIET, Hyderabad, India Dr. B. V. Kiranmayee, VNR VJIET, Hyderabad, India Dr. Raghava Rao Mukkamala, Director, CBDA, CBS, Denmark Dr. K. Srujan Raju, CMRTC, Hyderabad, India
International Advisory Committee Dr. Dayang Rohaya Awang Rambli, University Teknologi, Malaysia Dr. P. N. Suganthan, NTU, Singapore Dr. Aynur Unal, Director, Member of the Executive Team, UK Dr. Pawan Lingras, Saint Mary’s University, Canada Dr. Margarita N. Favorskaya, Siberian State Aerospace University of Science and Technology, Russian Federation Dr. Raghava Rao Mukkamala, CBDA, CBS, Denmark Dr. Vishnu Pendyala, San Jose State University, USA Dr. Radhakrishnan Palanikumar, King Khalid University, Abha, Kingdom of Saudi Arabia Dr. Glaret Shirley Sinnappan, Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia Dr. Prasad Mavuduri, University of Emerging Technologies, USA Dr. Anitesh Barua, Department of Information Management, UT, USA Md. Qayyum, King Khalid University, Abha, Kingdom of Saudi Arabia Wael Salah, Minia University, Minya, Egypt Dr. Ahamad J. Rusumdar, KIT, Germany
Team ICACECS-2021
Dr. V. R. Chirumamilla, EUT, The Netherlands Dr. Rakhee, The University of the West Indies, Mona, Jamaica
National Advisory Committee Dr. P. Premchand, Professor, Osmania University, Hyderabad, India Dr. Suresh Chandra Satapathy, KIIT University, Bhubaneshwar, India Shri. Rajiv Ratan Chetwani, Director at ISRO Headquarters, India Dr. Rajeev Srivastava, IIT BHU, Varanasi, India Dr. C. Krishna Mohan, IIT, Hyderabad, India Dr. Raj Kamal, Former Vice Chancellor, DAVV, Indore, India Dr. B. Eswara Reddy, Director, SDC, JNTUA, Anantapuramu, India Dr. R. B. V. Subramanyam, NITW, Warangal, India Dr. S. Bapi Raju, IIITH, Hyderabad, India Dr. D. V. L. N. Somayajulu, NITW, Warangal, India Dr. Vadlamani Ravi, IDRBT, Hyderabad, India Dr. O. B. V. Ramanaiah, JNTUH, Hyderabad, India Dr. Peddoju Sateesh Kumar, IIT Roorkee, India Dr. Supreethi K. P., JNTUH, Hyderabad, India Dr. K. Subrahmanyam, KL University, Vijayawada, India Dr. I. L. Narasimha Rao, Cyber Peace Foundation, New Delhi, India Dr. T. V. Rajini Kanth, Dean, SNIST, Hyderabad, India Dr. Pardha Saradhi, Training and Placements, VNR VJIET, Hyderabad, India Dr. Sanjay Kumar Singh, IIT BHU, Varanasi, India Dr. S. V. Rao, IITG, Guwahati, India Dr. V. Kamakshi Prasad, JNTUH, Hyderabad, India Dr. B. Vishnu Vardhan, JNTUHCEM, Manthani, India Dr. A. Sureshbabu, JNTUA, Anantapuramu, India Dr. S. Vasundra, JNTUA, Anantapuramu, India Dr. C. Shoba Bindhu, JNTUA, Anantapuramu, India Dr. D. Vasumathi, JNTUH, Hyderabad, India Dr. V. A. Narayana, Principal, CMRCET, Hyderabad, India Dr. M. Seetha, GNITS, Hyderabad, India Dr. K. Vijaya Kumar, CMRCET, Hyderabad, India Dr. P. Vijaya Pal Reddy, MEC, Hyderabad, India Dr. Sidharth Dabhade, NIELIT, Aurangabad, India Dr. M. Swamy Das, CBIT, Hyderabad, India Dr. Y. Padmasai, VNR VJIET, Hyderabad, India Dr. G. Ramesh Chandra, VNR VJIET, Hyderabad, India Dr. G. Suresh Reddy, VNR VJIET, Hyderabad, India Dr. N. Sandhya, VNR VJIET, Hyderabad, India Dr. P. Neelakantan, VNR VJIET, Hyderabad, India Dr. M. Raja Sekar, VNR VJIET, Hyderabad, India
vii
viii
Team ICACECS-2021
Dr. K. Anuradha, VNR VJIET, Hyderabad, India Dr. Poonam Upadhyay, VNR VJIET, Hyderabad, India Dr. R. Manjula Sri, VNR VJIET, Hyderabad, India Dr. A. Mallika, VNR VJIET, Hyderabad, India Dr. G. Srinivasa Gupta, VNR VJIET, Hyderabad, India Dr. Srinivasa Rao T., VNR VJIET, Hyderabad, India Dr. T. Jayashree, VNR VJIET, Hyderabad, India Dr. N. Mangathayaru, VNR VJIET, Hyderabad, India Anil Sukheja, Scientist “E” ISRO, Ahmedabad, India Dr. K. G. Mohan, Presidency University, Bangalore, India
Technical Committee Dr. Pilli Emmanuel Subhakar, MNIT, Jaipur, India Dr. A. P. Siva Kumar, JNTUCEA, Andhra Pradesh, India Dr. Peddoju Sateesh Kumar, IIT Roorkee, India Dr. Ch. Sudhakar, NIT Warangal, India Dr. Ilaiah Kavati, NIT Warangal, India Dr. U. Srinivasulu Reddy, NIT Tiruchirappalli, India Dr. M. Abdul Hameed, UCE, Osmania University, Hyderabad, India Dr. Sujatha Banothu, UCE, Osmania University, Hyderabad, India Dr. Rakesh Matam, IIIT Guwahati, India Dr. Venkanna U., IIIT Naya Raipur, India Dr. Subodh Srivastava, NIT Patna, India Dr K. Suresh Babu, SIT, JNTUH, Hyderabad, India Dr. Venkata Rami Reddy, SIT, JNTUH, Hyderabad, India Dr. Padmaja Joshi, C-DAC Mumbai, India Dr. Vijaya Latha, GRIET, Hyderabad, India Dr. B. Satyanarayana, Principal, CMRIT, Hyderabad, India Dr. Adiraju Prashanth Rao, Anurag College of Engineering, Hyderabad, India Dr. A. Narayana Rao, NBKRIST, Andhra Pradesh, India Dr. K. Purnachand, BVRIT Narsapur, Hyderabad, India Dr. Palamukala Ramesh, SIET Puttur, Andhra Pradesh, India Dr. K. Srinivasa Reddy, BVRITW, Hyderabad, India Dr. N. Rajasekhar, GRIET, Hyderabad, India Mr. Korra Lakshman Naik, Scientist-D, NIELIT, Aurangabad, India Dr. Halawath Balaji, SNIST, Hyderabad, India Dr. Konda Srinivas, CMRTC, Hyderabad, India Dr. M. B. Srinivas, BITS Pilani, Hyderabad, India Dr. Ravichander Janapati, SR University, Warangal, India Dr. T. Kishore Kumar, NIT Warangal, India Dr. S. Nagini, VNR VJIET, Hyderabad, India Ms. V. Baby, VNR VJIET, Hyderabad, India
Team ICACECS-2021
Mr. M. Gangappa, VNR VJIET, Hyderabad, India Dr. P. V. Siva Kumar, VNR VJIET, Hyderabad, India Dr. T. Sunil Kumar, VNR VJIET, Hyderabad, India Dr. Y. Sagar, VNR VJIET, Hyderabad, India Dr. Deepak Sukheja, VNR VJIET, Hyderabad, India Dr. Thippeswamy G., BMS Institute of Technology and Management, Bangalore Dr. K. Srinivas, VNR VJIET, Hyderabad, India Dr. A. Kousar Nikhath, VNR VJIET, Hyderabad, India Dr. Manjunath T. N., BMS Institute of Technology and Management, Bangalore Dr. Chalumuru Suresh, VNR VJIET, Hyderabad, India Dr. D. N. Vasundhara, VNR VJIET, Hyderabad, India Mr. G. S. Ramesh, VNR VJIET, Hyderabad, India Dr. K. Rajeev, GRIET, Hyderabad, India Mr. P. Venkateswara Rao, VNR VJIET, Hyderabad, India Mr. G. Naga Raju, VNR VJIET, Hyderabad, India Dr. V. Ramesh, Presidency University, Bangalore, India Mr. K. Venkateswara Rao, CMRCET, Hyderabad, India Dr. Ravikumar G. K., Adichunchanagiri University, Mandya, Karnataka Dr. P. Dileep Kumar Reddy, SV College of Engineering, Tirupati
Program Committee Dr. B. V. Kiranmayee, VNR VJIET, Hyderabad Dr. C. Kiran Mai, VNR VJIET, Hyderabad Dr. A. Brahmananda Reddy, VNR VJIET, Hyderabad Dr. P. Subhash, VNR VJIET, Hyderabad Dr. G. Ramesh Chandra, VNR VJIET, Hyderabad Dr. N. Sandhya, VNR VJIET, Hyderabad Dr. P. Neelakantan, VNR VJIET, Hyderabad Dr. M. Rajasekhar, VNR VJIET, Hyderabad Mrs. V. Baby, CSE, VNR VJIET Dr. S. Nagini, VNR VJIET, Hyderabad Dr. P. V. Siva Kumar, VNR VJIET, Hyderabad Dr. T. Sunil Kumar, VNR VJIET, Hyderabad Mr. M. Gangappa, CSE, VNR VJIET, Hyderabad Dr. Y. Sagar, VNR VJIET, Hyderabad Dr. Deepak Sukheja, VNR VJIET, Hyderabad Dr. A. Kousar Nikhath, VNR VJIET, Hyderabad Dr. K. Srinivas, VNR VJIET, Hyderabad Dr. Ch. Suresh, VNR VJIET, Hyderabad
ix
x
Organizing Committee Dr. B. V. Kiranmayee, HOD-CSE, VNR VJIET Dr. C. Kiran Mai, CSE, VNR VJIET Dr. A. Brahmananda Reddy, CSE, VNR VJIET Dr. P. Subhash, CSE, VNR VJIET Mrs. V. Baby, CSE, VNR VJIET Dr. S. Nagini, CSE, VNR VJIET Dr. P. V. Siva Kumar, CSE, VNR VJIET Dr. T. Sunil Kumar, CSE, VNR VJIET Mr. M. Gangappa, CSE, VNR VJIET, Hyderabad Dr. Y. Sagar, CSE, VNR VJIET Dr. Deepak Sukheja, CSE, VNR VJIET Dr. K. Srinivas, CSE, VNR VJIET Dr. A. Kousar Nikhath, CSE, VNR VJIET Dr. A. Harshavardhan, CSE, VNR VJIET Dr. Chalumuru Suresh, CSE, VNR VJIET Mrs. D. N. Vasundara, CSE, VNR VJIET Mr. G. S. Ramesh, CSE, VNR VJIET Mrs. A. Madhavi, CSE, VNR VJIET Mrs. N. V. Sailaja, CSE, VNR VJIET Mrs. R. Vasavi, CSE, VNR VJIET Mrs. Vijaya Saraswathi, CSE, VNR VJIET Mrs. P. Radhika, CSE, VNR VJIET Mrs. Sravani Nalluri, CSE, VNR VJIET Mr. T Gnana Prakash, CSE, VNR VJIET Mr. G. Nagaraju, CSE, VNR VJIET Mr. R. Kranthi Kumar, CSE, VNR VJIET Mr. P. Venkateswara Rao, CSE, VNR VJIET Mrs. Y. Bhanusree, CSE, VNR VJIET Mr. M. Ravikanth, CSE, VNR VJIET Mrs. S. Jhahnavi, CSE, VNR VJIET Mr. N. Sandeep Chaitanya, CSE, VNR VJIET Mrs. Tejaswi Potluri, CSE, VNR VJIET Mr. P. Bharath Kumar Chowdary, CSE, VNR VJIET Mr. P. Ramakrishna Chowdary, CSE, VNR VJIET Mrs. Priya Bhatnagar, CSE, VNR VJIET Mr. V. Hareesh, CSE, VNR VJIET Mr. K. Bheemalingappa, CSE, VNR VJIET Mrs. L. Indira, CSE, VNR VJIET Mrs. S. Swapna, CSE, VNR VJIET Mrs. Kriti Ohri, CSE, VNR VJIET Mrs. K. Jhansi Laksmi Bhi, CSE, VNR VJIET Dr. K. Venkata Ramana, CSE, VNR VJIET
Team ICACECS-2021
Team ICACECS-2021
xi
Mr. Ch. Sri Sumanth, CSE, VNR VJIET Mrs. P. Jyothi, CSE, VNR VJIET Mr. Nayamtulla Khan, CSE, VNR VJIET Mrs. K. Bhagya Rekha, CSE, VNR VJIET Mrs. M. Prathyusha, CSE, VNR VJIET Mrs. G. Laxmi Deepthi, CSE, VNR VJIET Mrs. Avadhani Bindu, CSE, VNR VJIET Mrs. S. Nyemeesha, CSE, VNR VJIET
Publication and Proceedings Committee Dr. A. Brahmananda Reddy, CSE, VNR VJIET Dr. P. Subhash, CSE, VNR VJIET Mrs. V. Baby, CSE, VNR VJIET Dr. K. Srinivas, CSE, VNR VJIET Mrs. D. N. Vasundhara, CSE, VNR VJIET Dr. A. Kousar Nikhath, CSE, VNR VJIET Mrs. L. Indira, CSE, VNR VJIET Mrs. K. Jhansi Laksmi Bhi, CSE, VNR VJIET Mrs. P. Jyothi, CSE, VNR VJIET Mrs. Avadhani Bindu, CSE, VNR VJIET Mrs. S. Nyemeesha, CSE, VNR VJIET
Website Management Committee Mr. G. S. Ramesh, CSE, VNR VJIET Mr. R. Kranthi Kumar, CSE, VNR VJIET
Review Committee Dr. Mohammed Shafiul Alam Khan, University of Dhaka, Dhaka, Bangladesh Dr. Radhakrishnan Palanikumar, King Khalid University, Saudi Arabia Dr. Kanwalinderjit Gagneja, Florida Polytechnic University, USA Dr. Ravi Vadapalli, HPCC, Texas Tech University, USA Dr. Jagdish Shivhare, Distinguished Scientist and Member-IRBA, USA Dr. Junyuan Zeng, The University of Texas at Dallas, USA Dr. Neeraj Mittal, The University of Texas at Dallas, USA Dr. Shailendra Shukla, Motilal Nehru National Institute of Technology, Allahabad, India
xii
Team ICACECS-2021
Dr. Sumit Mishra, IIIT Guwahati, Assam, India Dr. Vivek Tiwari, IIIT Raipur, India Dr. Muralidhar Kulkarni, NIT Surathkal, Mangalore, Karnataka, India Dr. Raghavan S., NIT Tiruchirappalli, India Dr. Kaustuv Nag, IIIT Guwahati, Assam, India Dr. Angshuman Jana, IIIT Guwahati, Assam, India Dr. G. Suresh Reddy, VNR VJIET, Hyderabad, India Dr. Dr. Mohammed Misbahuddin, C-DAC, Bangalore, India Dr. Subhodh Srivastava, NIT Patna, India Dr. D. Srinivasa Rao, VNR VJIET, Hyderabad, India Dr. V. RadhaKrishna, VNR VJIET, Hyderabad, India Dr. Keshetti Sreekala, MGIT, Hyderabad, India Dr. G. Madhu, VNR VJIET, Hyderabad, India Dr. K. Kavitha, CMRIT, Hyderabad, India Dr. B. V. Seshu Kumari, VNR VJIET, Hyderabad, India Dr. D. Kalyani, VNR VJIET, Hyderabad, India Dr. Neelima, JITS, Karimnagar, India Dr. A. Srnivasa Rao, VNR VJIET, Hyderabad, India Dr. K. Prasanna Lakshmi, GRIET, Hyderabad, India Dr. Amjan Shek, BVRIT, Hyderabad, India Dr. P. Vijyapal Reddy, Matrusri Engineering College, Hyderabad, India Dr. K. Srinivasa Reddy, BVRITW Hyderabad, India Dr. Halavath Balaji, SNIST, Hyderabad, India Dr. N. Rajasekhar, GRIET, Hyderabad, India Dr. Shambasiva Rao, BJR Government Degree College, Hyderabad Dr. Srinivas Konda, CMRTC, Hyderabad, India Dr. K. F. Bharati, JNTUA, Anantapuramu, Andhra Pradesh, India Dr. N. Rama Subramanian, NIT Tiruchirappalli, India Dr. L. Anjaneyulu, NIT Warangal, India Dr. P. Sreehari Rao, NIT Warangal, India Dr. Ravindhar Reddy B., Annamacharya Institute of Technology and Sciences, India Dr. M. Chinna Rao, Srinivasa Institute of Engineering and Technology Dr. S. Govinda Rao, GRIET, Hyderabad, India Dr. Purna Chand K., BVRIT Narsapur, Hyderabad, India Dr. Sreekala K., MGIT, Hyderabad, India Dr. M. A. Jabbar, VEC, Hyderabad, India Dr. Kamakshaiah, Geethanjali CET, Hyderabad Dr. A. Kousar Nikhath, VNR VJIET, Hyderabad, India Dr. B. Krishna, Vaagdevi College of Engineering, Warangal Dr. Korra Lakshman, NIELIT, Aurangabad, India Dr. M. Mahalakshmi, CMR CET, Hyderabad, India Dr. Merugu Suresh, CMRCET, Hyderabad, India Mr. G. Nagaraju, VNR VJIET, Hyderabad, India Dr. V. Nagaveni, Acharya Institute of Technology, Bangalore, India Dr. Y. Jeevan Nagendra Kumar, GRIET, Hyderabad, India
Team ICACECS-2021
Dr. A. Narayana Rao, NBKRIST, Nellore, India Dr. Sagar Yeruva, VNR VJIET, Hyderabad, India Dr. Neelima Vontela, JITS, Karimnagar, India Dr. Krishna Prasad Ponnekanti, SVCE, Tirupati, India Dr. K. Prasanna Lakshmi, GRIET, Hyderabad, India Dr. Adiraju Prashanth Rao, Anurag University, Hyderabad, India Dr. V. Prashanthi, GRIET, Hyderabad, India Dr. Mohd. Qayyum, King Khalid University, Abha, Saudi Arabia Dr. Y. Vijaya Latha, GRIET, Hyderabad, India Dr. Rohit Raja, SREYAS IET, Hyderabad, India Dr. E. Raju, Vaagdevi Engineering College, Warangal, India Dr. D. Raman, Vardhaman College of Engineering, Hyderabad, India Dr. M. Shanmukhi, MGIT, Hyderabad, India Dr. M. Sharadha Varalakshmi, St. Peters EC, Hyderabad, India Dr. K. Srinivas, VNR VJIET, Hyderabad, India Dr. Y. Suresh, BITM, Ballari, India Dr. M. Suresh Kumar, Vaagdevi CE, Warangal, India Dr. K. L. S. Soujanya, CMRCET, Hyderabad, India Dr. V. Venkateshwarlu, Vaagdevi CE, Warangal, India Dr. M. Venugopala Chari, CBIT, Hyderabad, India Dr. D. Raman, Vardhaman CE, Hyderabad, India Dr. V. Ramesh, Christ University, Bangalore, India Dr. V. Akhila, GRIET, Hyderabad, India Dr. Kalla Madhusudhana, CVRCE, Hyderabad, India Dr. P. Lalitha Surya Kumari, KL University, Hyderabad, India Dr. Anita Bai, KL University, Hyderabad, India Dr. K. Prasanna, ITS (Autonomous), Rajampet Dr. P. V. Shiva Kumar, VNR VJIET, Hyderabad Dr. Shaheen, MVJ CE (VTU), Bangalore Dr. G. Malini Devi, GNITS, Hyderabad Dr. Gunupudi Rajesh Kumar, VNR VJIET, Hyderabad Dr. S. Nagini, VNR VJIET, Hyderabad Dr. S. Venu Gopal, Vardhaman CE, Hyderabad, India Mrs. V. Baby, VNR VJIET, Hyderabad Dr. T. Sunil Kumar, VNRV JIET, Hyderabad Mr. P. Venkateswara Rao, VNR VJIET, Hyderabad Dr. Munishekar, Vardhaman College of Engineering Dr. K. Kranthi Kumar, SNIST, Hyderabad Dr M. Sadanandam, Kakatiya University, Warangal Dr. Hanumanthu Bhukya, KITS, Warangal Dr. Chalumuru. Suresh, VNR VJIET, Hyderabad Dr. Deepak Sukheja, VNR VJIET, Hyderabad Dr. Dasaradh Ramaiah, BVRIT, Narsapur Dr. Ch. Mallikarjuna Rao, GRIET, Hyderabad Dr. M. Madhubala, IARE, Hyderabad
xiii
xiv
Team ICACECS-2021
Dr. S. S. Aravinth, KL University Dr. S. Madhusudhanan, KL University Dr. T. Nagalakshmi, KL University Dr. D. Venkatesh, GATES Institute of Technology, Gooty, JNTUA Dr. A. L. Sreeinvasulu, GATES Institute of Technology, Gooty, JNTUA Dr. M. Veeresha, Santhiram Engineering College, Nandyal, JNTUA Dr. K. Govardhan Reddy, G. Pulla Reddy Engineering College, Kurnool Dr. G. Ramesh, GRIET, Hyderabad Dr. K. Shyam Sunder Reddy, Vasavi College of Engineering, Hyderabad Dr. Prabhakar Kandukuri, Dhanekula Institute of Engineering and Technology, Vijayawada Dr. Kolluru Venkata Nagendra, Audisankara CE, Nellore Dr. P. Praveen, SR University, Warangal Dadi Ramesh, SR University, Warangal Dr. M. V. Narayana, Guru Nanak Institute, Hyderabad Dr. S. Ramakrishna, Bapatla Engineering College, Bapatla, Guntur, AP Dr. Sudarshan E., SRITW, Warangal Dr. A. Mallikarjun Reddy, Anurag University, Hyderabad Mr. P. Pramod Kumar, SR University, Warangal Dr. R. Suneetha Rani, QIS College, Ongole Mrs. D. N. Vasundara, CSE, VNR VJIET Mr. G. S. Ramesh, CSE, VNR VJIET Dr. M. Sridevi, CVR College, Hyderabad Dr. T. Archana, KU, Warangal, India Dr. R. Vijay Prakash, SR University, Warangal Dr. Kalyanapu Srinivas, KITS, Warangal Dr. V. Shankar, KITS, Warangal Dr. K. Vinay Kumar, KITSW, Warangal Dr. G. Balakrishna, Anurag University, Hyderabad Dr. J. Somasekar, Gopalan College and Management, Bangalore Dr. M Sujatha, JITS, Karimnagar Dr. Aakunuri Manjula, JITS Mr. R. Srinivas, MGIT, Hyderabad Dr. Ranjith Kumar M., KITS, Warangal Mr. M. Sreenivasulu, BVRIT Narsapur Dr. VIjayakumar Polepally, KITSW, Warangal Dr. Rafath Samrin, CMR, Hyderabad Dr. Mohammed Abdul Bari, ISL College of Engineering, Hyderabad Dr. P. Pavan Kumar, ICFAITECH, Hyderabad Dr. Subbarayudu Y., GRIET, Hyderabad Dr. A. Vani, TS Transco, Hyderabad Dr. G. Venu, CVSR, Hyderabad Dr. T. Parameshwar, VJIT, Hyderabad Dr. D. Ramesh, Kakatiya University, Warangal Dr. S. V. Vasantha, MVSR EC, Hyderabad
Team ICACECS-2021
Dr. Sheshikala Martha, SR University, Warangal Dr. Ramesh C. H., GNITS, Hyderabad Dr. K. Venkata Ramana, VNR VJIET, Hyderabad Dr. Sarangam Kodati, TKREC, Hyderabad Dr. M. Sridevi, Anurag University, Hyderabad Dr. D. V. Lalith Parameswari, GNITS, Hyderabad Dr. Rudra Kalyan Nayak, KL University, Vijayawada Dr. Megha Bushan, DIT University, Dehradun Dr. R. Delshi Howsalya Devi, KVCET, Tamil Nadu, India Dr. Seethramulu, ICFAI University, Hyderabad, India
xv
Preface
Computer Engineering and Communication Systems are entwined more now than any other time in the history. The interplay of Information and Communication Technologies, the rise of Internet of Things (IoT) applications, smart computing, and the inroads that technology has taken into personal lives through wearable and so on have significant roles to play in the connotation of computer engineering with communication systems. The International Conference on Advances in Computer Engineering and Communication Systems (ICACECS-2021) is themed around smart innovations, Industry 4.0 technologies, data analytics, networks, and communication systems, thereby celebrating the emerging technology trends in Computer Engineering and Communication Systems. This conference is organized as five parallel tracks, viz. artificial intelligence and machine learning; Cloud, IoT, and distributed computing; image processing; data analytics and NLP; and deep learning and soft computing. ICACECS-2021 is a unique forum bringing together scholars from the different countries to participate and transform the research landscape of the globe and to carve a road map for implementation. It provides a valuable networking opportunity and brings a new era for the research scholars, students, professors, and industries providing insights to the recent trends and developments in the field of computer science with a special focus on Mezzanine technologies. This book comprises the best deliberations of the “International Conference on Advances in Computer Engineering and Communication Systems (ICACECS2021)”, organized online by Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, during 13–14 August 2021. Bringing into the present the best possibilities of the future is termed “Presencing”, and it is a part of the core philosophy at Vignana Jyothi. Thus, ICACECS-2021 also provides space for technologies with significant societal impact, such as green computing, 5G networks and IoT, social network behaviours, smart energy networks, smart grids and renewable energy, agricultural informatics, assistive technologies, and intelligent transportation. Out of 232 paper submissions from all over the globe, 57 papers are being published after being reviewed scrupulously. Among them, 11 papers are from various countries internationally and 46 papers are from different states of the xvii
xviii
Preface
country including Academia, Industry, and Research and Development organizations like ISRO and INCOIS. This book volume focuses on thoroughly refereed post-conference enlargements and reviews on the advanced topics in artificial intelligence, machine learning, data mining and big data computing, Cloud computing, Internet on Things, distributed computing, and smart systems. We assure that we have put in every effort to ensure that the participatory experience in the e-conference will not feel like a compromise but will add value in its own stride and enable more people to participate. We hope that the conference shall serve its purpose for the best exposition and progress of science, engineering, technology, and society. Hyderabad, India August 2021
Dr. A. Brahmananda Reddy Dr. B. V. Kiranmayee Dr. Raghava Rao Mukkamala Dr. K. Srujan Raju
Acknowledgements
Team ICACECS-2021 acknowledges the support extended by Computer Society of India (CSI), for their continuous sustenance in resourcing the technical knowledge base. It furthered and augmented our thoughts to practise. We thank all the authors for their contributions and timely response. A special thanks to our reviewers who invested their valuable time and conscientious evaluation of the submissions for the best outcomes. We express our profound gratitude for the inspiring and informative presentations of our keynote speakers on the frontline technologies, creating the curiosity to explore more. The co-operation extended by the Session Chair is immense, elevating the presentation skills among the participants. We would like to thank our chief guests and guests of honour for accepting our invitation and making it to the e-conference, furthering our spirit to achieve more in providing technology solutions to the society—involve and evolve. Thanks to the founding members of Vignana Jyothi for their wisdom and social responsibility, who firmly believe that all technology must serve the urgent need of advancing the society, at micro- and/or macro-levels. Our sincere thanks are extended to all the patrons, chairpersons, members of the Editorial Board, eminent members of the Programme Committee, and Advisory Committee for their guidance and support and to the enthusiastic people among the Technical Committee for their coordination and help in execution. We would like to extend our unlimited appreciation for the amazing work done by our self-reliant and motivated team of Computer Science and Engineering Department faculty and staff. The amazing dedication and effort of the team enabled to reach our goal and put up this show with an ease. A special acknowledgement to Shri. Suresh Dharmalingam, Project Coordinator, Books Production, Springer, for the prompt communication and support. Finally, we thank the team comprising Shri. Aninda Bose, and editors of these proceedings, for the constant direction and sustenance. Team ICACECS-2021 xix
About the Institute, VNR VJIET
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology (VNR VJIET) was established by the Vignana Jyothi Society as a not-for-profit organization in the year 1995–96, with a motto to provide value-based education on par with international standards. The philosophy of Vignana Jyothi unravels education as a process of “Presencing” that provides, both individually and collectively, to one’s deepest capacity to sense and experience the knowledge and activities to shape the future. VNR VJIET, an autonomous institute established with the permission of AICTE, recognized by UGC as “College with Potential for Excellence” offers 13 B.Tech. and 13 M.Tech. programmes. All the courses offered by the institute are affiliated to Jawaharlal Nehru Technological University Hyderabad, Hyderabad. The institute is recognized under Section 2(f) and 12(B) of UGC Act, 1956 and is accredited by NAAC with “A++” grade and CGPA 3.73 in Cycle II in 2018. Seven B.Tech. courses are accredited by NBA. AICTE has identified the institute as “AICTE Research Center” under National Doctoral Fellowship (NDF) scheme, and Ph.D. scholars are allocated. MHRD, India has ranked the institute at 127th rank in engineering category and 151–200 rank band in overall category in NIRF 2020. QS I-GAUGE has rated the institution with overall DIAMOND rating and has provided E-LEAD certification. Under MARGADARSHAN and PARAMARSH scheme, the institute mentors 17 institutes in the state of Telangana. VNR VJIET is categorized as “B and A” institution (rank between 06 and 25) in category of “Private or Self-Financed College/Institutes” in ARIIA 2020. VNR VJIET is committed to providing a value-added curriculum that is driven by robust infrastructure, highly qualified and competent staff, and a vision to remain a top-ranked institution that is primed by research and innovation. It is one of the most distinguished and premier institutions of higher education in the state of Telangana.
xxi
About the Department—Computer Science and Engineering
The Department of Computer Science and Engineering (CSE), established in the year 1995, evolved towards enhancing computing and its applications to build the intellectual capital of the society. The department is witnessing a period of exciting growth and opportunity propelled by the advancement of technology and its recognition through excellence. The department offers seven UG courses—B.Tech.: 1. Computer Science and Engineering, 2. Computer Science and Business Systems (CSBS) in collaboration with Tata Consultancy Services (TCS), 3. CS-Artificial Intelligence and Machine Learning, 4. CS-Cyber Security, 5. CS-Data Science, 6. CS-Internet of Things, and 7. CS-Artificial Intelligence and Data Science and two PG courses—M.Tech.: 1. Software Engineering (SE) and 2. Computer Science Engineering (CSE). The stateof-the-art laboratories equipped with the latest infrastructure and software strengthen the progress of the department stakeholders. Upskilling is a continuous activity in the department. Committed and competent faculty keep abreast of the upcoming technologies, through continuous participation in seminars and workshops, registering for the advanced courses through MOOCs and acquiring merit certifications. A large number of computer science students get placed every year in national and multinational companies. The department conducts numerous inter/intracollegiate competitions, providing a platform for enhancing knowledge base, soft skills, and team spirit. The department is NBA accredited and is recognized as research centre by JNTUH. CSE department is involved in pioneering research with global excellence and local relevance in cutting-edge technologies like image processing, augmented reality, data analytics, Internet of Things, Mezzanine technologies, networks and has several publications to its credit listed in SCOPUS, Web of Science, SCI indexed, etc. Department hosts funded projects from reputed organizations—AICTE, DST, DRDO, UGC, ISSR, MHRD, and ITRA.
xxiii
Contents
1
2
3
Deep Learning for Conversions Between Melodic Frameworks of Indian Classical Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rohan Surana, Aakash Varshney, and Vishnu Pendyala Content-Based Image Retrieval System Using Fuzzy Colour and Local Binary Pattern with Apache Lucene . . . . . . . . . . . . . . . . . . . Nurul Fariza Zulkurnain, Mohamad Asyraaf Azhar, and Moksud Alam Mallik Machine Vision-Based Conveyor and Structural Health Monitoring Robot for Industrial Application Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khalid, Ahmed Rimaz Faizabadi, and Moksud Alam Mallik
1
13
21
4
Design and Development of Marathi Word Stemmer . . . . . . . . . . . . . P. Vaishali Kadam, B. Kalpana Khandale, and C. Namrata Mahender
5
Survey on Deep Learning System for Intruder Behavior Detection and Classification in Cloud Computing . . . . . . . . . . . . . . . . M. Mohan, V. Tamizhazhagan, and S. Balaji
49
Abstractive Multi-document Summarization Using Deep Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Murkute Poornima, Venkateswara Rao Pulipati, and T. Sunil Kumar
57
6
35
7
Heart Disease Prediction Using Decision Tree and SVM . . . . . . . . . . . R. Vijaya Saraswathi, Kovid Gajavelly, A. Kousar Nikath, R. Vasavi, and Rakshith Reddy Anumasula
69
8
Classification of Skin Diseases Using Ensemble Method . . . . . . . . . . . D. N. Vasundara, Swetha Naini, N. Venkata Sailaja, and Sagar Yeruva
79
xxv
xxvi
9
Contents
An Integrated Decision Support System for Storm Surge Early Warning Using SOA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Padmanabham, P. L. N. Murty, T. Srinivasa Kumar, and T. V. S. Udaya Bhaskar
10 Object Tracking and Detection Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. N. Sujatha, P. Sahithi, R. Hamsini, and M. Haripriya
89
97
11 Novel Coronavirus Progression Analysis Using Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Alagam Padmasree, Talluri Kavya, Kukkadapu Santhoshi, and Konda Srinivasa Reddy 12 A Modern Approach to Seed Quality Check and it’s Traceability in Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 N. Sandeep Chaitanya, Rajitha Bhargavi Movva, and Sagar Yeruva 13 Experimental Face Recognition System Using Deep Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Nsikak Imoh, Narasimha Rao Vajjhala, and Sandip Rakshit 14 Customer-Centric E-commerce Implementing Artificial Intelligence for Better Sales and Service . . . . . . . . . . . . . . . . . . . . . . . . . 141 Salu George Thandekkattu and M. Kalaiarasi 15 Tomato Plant Disease Classification Using Deep Learning Architectures: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 U. Shruthi, V. Nagaveni, C. S. Arvind, and G. L. Sunil 16 Machine Learning System for Textile Fabric Defect Detection Using GLCM Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Shridevi Soma and Hattarki Pooja 17 Alzheimer’s Disease Prediction via Optimized Deep Learning Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 G. Stalin Babu, S. N. Tirumala Rao, and R. Rajeswara Rao 18 Using Neural Networks to Detect Emotions in Documents . . . . . . . . . 191 Keshav Kumar and Yash Mittra 19 An Innovative Model-Based Approach for Credit Card Fraud Detection Using K-Nearest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Harikrishna Bommala, Rayapati Mabu Basha, B. Rajarao, and K. Sangeetha 20 Segmentation of Brain Tumour in Multi-Contrast Mri Image Using U-Net Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 N. Sandeep Chaitanya, Prathyusha Karadi, Surendra Kurapati, Ankathi Rohith, and Yenreddy Vamshi Krishna
Contents
xxvii
21 Transformer Data Analysis for Predictive Maintenance . . . . . . . . . . . 217 Sreshta R. Putchala, Rithik Kotha, Vanitha Guda, and Yellasiri Ramadevi 22 Synthetic Face Image Generation Using Deep Learning . . . . . . . . . . . 231 C. Sireesha, P. Sai Venunath, and N. Sri Surya 23 Machine Learning and Neural Network Models for Customer Churn Prediction in Banking and Telecom Sectors . . . . . . . . . . . . . . . 241 Ketaki Patil, Shivraj Patil, Riya Danve, and Ruchira Patil 24 Evaluation of Social Human Sentiment Analysis Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Anjali Agarwal, Ajanta Das, and Roshni Rupali Das 25 A Background Study on Feature Extraction for 2D and 3D Object Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Xiaobu Yuan and Shivani Pachika 26 A Feature Extraction and Heatmap Generation Approach Based on 3D Object Models and CNNS . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Shivani Pachika and Xiaobu Yuan 27 Evaluation of Tools Used for 3D Reconstruction of 2D Medical Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Srinikhil Durisetti, Darsani Alapati, Sai Keerthi Vadnala, Keerthana Kotha, G. Ramesh Chandra, and Sathya Govindarajan 28 Leveraging Classification and Detection of Malware: A Robust Machine Learning-Based Framework . . . . . . . . . . . . . . . . . . 299 Lingaraj Sethi and Prashanta Kumar Patra 29 A Robust Machine Learning Approach Towards Detection of Parkinson’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 A. R. Susmitha and Saneev Kumar Das 30 A Novel Model to Diagnose Pneumonia Using Chest X-ray Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Lavanya Bagadi, Shivani Panda, Praveen Pillalamarri, P. Hemanth, and V. Kiran 31 Support Vector Machine Optimization Using Secant Hyperplane Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Lingam Sunitha and M. Bal Raju 32 Analysis and Prediction of Air Pollutant Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Chalumuru Suresh, B. V. Kiranmayee, and Balannolla Sneha
xxviii
Contents
33 Real-Time Face Mask Detection Using Machine Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Bhagavathula Pushyami, C. N. Sujatha, Bonthala Sanjana, and Narra Karthik 34 Predicting the Potentially Hazardous Asteroid to Earth Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Kaveti Upender, Tammali Sai Krishna, N. Pothanna, and P. V. Siva Kumar 35 Generating Automatic Ground Truth by Integrating Various Saliency Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Ramesh Cheripelli and A. N. K. Prasannanjaneyulu 36 GATEZEE—An Automated Gate Pass Management System . . . . . . 385 Vaddeboyina Sri Manvith, Shiva Madhunala, and B. V. Kiranmayee 37 Obesity Prediction Based on Daily Lifestyle Habits and Other Factors Using Different Machine Learning Algorithms . . . . . . . . . . . 397 Chalumuru Suresh, B. V. Kiranmayee, Milar Jahnavi, Roshan Pampari, Sai Raghu Ambadipudi, and Sai Srinivasa Preetham Hemadri 38 A Brief Analysis of Fault-Tolerant Ripple Carry Adders with a Design for Reliable Approximate Adders . . . . . . . . . . . . . . . . . . 409 Asma Iqbal and K. Manjunatha Chari 39 Performance Assessment Using Supervised Machine Learning Algorithms of Opinion Mining on Social Media Dataset . . . . . . . . . . . 419 M. Susmitha and R. Laxmi Pranitha 40 Enhancing English Proficiency Using NLP . . . . . . . . . . . . . . . . . . . . . . . 429 A. Brahmananda Reddy, P. Vaishnavi, M. Jahnavi, G. Sameeksha, and K. Sandhya 41 Best Practices and Strategy for the Migration of Service-Oriented Architecture-Based Applications to Microservices Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Vinay Raj and K. Srinivasa Reddy 42 Retinal Hemodynamics and Diabetes Mellitus Detection Through Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Ambika Shetkar, C. Kiran Mai, and C. Yamini 43 Deep Neural Networks Model to Detection Glaucoma in Prima Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Akhil Karnam, Himanshi Gidwani, Sachin Chirgaiya, and Deepak Sukheja
Contents
xxix
44 Time Series Analysis Using LSTM for Elderly Care Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Chagantipati Akarsh, Sagi Harshad Varma, and P. Venkateswara Rao 45 Brightness Contrast Using Convolution Neural Network . . . . . . . . . . 481 Sagar Yeruva, Anvesh Guduri, Yedla Sai Sreshta, Thatta Sumathi, and Dedeepya Tatineni 46 E-commerce Clothing Review Analysis and Model Building . . . . . . . 491 G. Manikiran, S. Greeshma, P. Vishnu Teja, Y. Sreehari Rao, Tanvir H. Sardar, and Moksud Alam Mallik 47 Face Detection and Comparison Using Deep Learning . . . . . . . . . . . . 499 R. Vijaya Saraswathi, D. N. Vasundhara, R. Vasavi, G. Laxmi Deepthi, and K. Jaya Jones 48 Long-Term Temporal Land Analysis of Agricultural Land and Shifting Cultivation Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Sejal Thakkar, Ved Suthar, Chirag Patel, Shree Sashikant Sharma, and Namra Patel 49 Mitigation of COVID-19 by Means of Face Mask and Social Distancing Detection Using OpenCV and YOLOv2 . . . . . . . . . . . . . . . 527 G. Sahitya, C. Kaushik, Podduturi Ashish Reddy, and G. Sahith Reddy 50 Abstractive Text Summarization Using T5 Architecture . . . . . . . . . . . 535 G. S. Ramesh, Vamsi Manyam, Vijoosh Mandula, Pavan Myana, Sathvika Macha, and Suprith Reddy 51 Heart Failure Prediction Using Classification Methods . . . . . . . . . . . . 545 Oruganti Shashi Priya, Kanakala Srinivas, and Sagar Yeruva 52 Detection and Classification of Cerebral Hemorrhage Using Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 P. Bharath Kumar Chowdary, Pathuri Jahnavi, Sudagani Sandhya Rani, Tumati Jahnavi Chowdary, and Kakollu Srija 53 A Novel CNN-Based Classification and Prediction of COVID-19 Disease Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . 565 Talluri Sunil Kumar, Sarangam Kodati, Sagar Yeruva, and Talluri Susan 54 Automated Defect Detection in Consumer-Grade Knives Using Active Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Keshav Kumar
xxx
Contents
55 Fake Account Detection in Social Media Using Big Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Shaik Mujeeb and Sangeeta Gupta Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
About the Editors
Dr. A. Brahmananda Reddy is an Associate Professor in the Department of CSE at VNR VJIET, Hyderabad. Dr. Reddy holds a Ph.D.-CSE and M.Tech.-CSE from JNTU Anantapur, and B.Tech.-CSIT from JNTUH Hyderabad. He has over 14 years of experience in the field of academic research and technological education and has more than 20 research papers published in various reputed national/international conferences and journals, listed in Scopus, Web of Science, IEEE, Inderscience, Springer proceedings, also editor for the book published by Springer “Algorithms for Intelligent Systems (AIS) series” titled Machine Learning Technologies and Applications. He is Member of IEEE, lifetime Member of ISTE and CSI. Dr. Reddy’s research interests include Knowledge Engineering, and semantic Web. He conducted many seminars/workshops/FDPs and delivered knowledge sharing sessions in various academic institutions. He has guided many UG and PG projects in Mezzanine technologies. He is in-charge for student’s industrial visits and Department MoUs in reducing the gap between academia and industry. He is a Jury Member consecutively for the Smart India Hackathon Grand Finale SIH 2018, SIH 2019, SIH 2020 Software Edition and evaluator for Toycathon 2021 organized by AICTE and MHRD. He chaired sessions for various international conferences and reviewed various reputed international conferences and journals. Dr. B. V. Kiranmayee working as Professor and HOD of CSE at VNR VJIET. She owned Ph.D. from JNTU Hyderabad in Data Mining, M.Tech. and B.Tech. in CSE. She has 23 years of teaching and research experience, and has published over 30 research papers in reputed international journals. She is a senior and life member of IEEE, ISTE&CSI. Dr. Kiranmayee’s research interests are data mining, machine learning, and deep learning. She received funds from UGC. Consultancy on Virtual Tour for Warangal Tourism, developing school educational videos, TCS online examinations. She is Convener for seminars, workshops, FDPs that benefited students. She is Chairman of BoS in the department and served as BOS Member for other colleges. She established the Centre of Excellence in Data Science, Big Data. Under her leadership the department signed MoUs with software Industries, Institutions. She has contributed for NBA accreditation, NAAC A++, QS Diamond xxxi
xxxii
About the Editors
rating, NIRF ranking. She got the Best Paper award for “Eye state Detection and Analysis for Driver’s Fatigue” in ICSCI. She guided UG, PG projects in Mezzanine technologies. Her Mentee students bagged ISTE, CSI awards titled “Best Student Innovator”, “Best CSE Student”, Best Accredited Student Branch and students won awards in coding competitions Smart Indian Hackathon-2018,2019, Code for Good (JPMC), Swish Hackathon, ACM/ICPC, VesAithon. Dr. Raghava Rao Mukkamala is the director for the Centre for Business Data Analytics, an associate professor at the Department of Digitalization, Copenhagen Business School (CBS), Denmark. He is also Program Director for the new Master’s Programme in Data Science. Raghava also holds an adjunct professorship at the Department of Technology, Kristiania University College, Oslo, Norway. His current research focus is on the interdisciplinary approach to big data analytics. Combining formal/mathematical modeling approaches with data/text mining techniques and machine learning methodologies, his current research program seeks to develop new algorithms and techniques for big data analytics, such as social set analytics. Dr. Raghava has published more than 70 peer-reviewed publications, and he has an h-index = 23 and i10-index = 39. In addition, he supervises 3 Ph.D. students and four postdocs in various research projects on blockchain, cybersecurity, data science, and big data analytics. Dr. Raghava holds a Ph.D. degree in Computer Science and an M.Sc. degree in Information Technology, both from IT University of Copenhagen, Denmark, and a Bachelor of Technology degree from JNTU, Hyderabad, India. Before moving to research, Raghava has many years of programming and IT development experience from the Danish IT industry. Dr. K. Srujan Raju is a Dean Student Welfare, HOD CSE at CMRTC. He obtained his Doctorate in Computer Science in Network Security. He has more than 20 years of experience in academics and research. Dr. Raju is presently working on 2 projects funded by Government of India under CSRI and NSTMIS, also filed 7 patents and 1 copyright at Indian Patent Office, edited more than 14 books published by Springer Book Proceedings of AISC series, LAIS series, and others are indexed by Scopus, also authored books in C Programming and Data Structure, Exploring to Internet, Hacking Secrets, contributed chapters in various books, and published more than 30 papers in reputed peer-reviewed journals and conferences. Dr. Raju, invited as Session Chair, Keynote Speaker, Technical Program Committee Member, Track Manager, and Reviewer for many national/international conferences also appointed as Subject Expert by CEPTAM DRDO—Delhi and CDAC. He mentored more than 100 students for incubating cutting edge solutions. He organized many conferences, FDPs, workshops, and symposiums. He has established the Centre of Excellence in IoT, Data Analytics. Dr. Raju is Member of various professional bodies, received Significant Contributor award and Active Young Member award from CSI India, also served as Management Committee Member, State Student Coordinator, and Secretary of CSI Hyderabad Chapter.
Chapter 1
Deep Learning for Conversions Between Melodic Frameworks of Indian Classical Music Rohan Surana, Aakash Varshney, and Vishnu Pendyala
1 Introduction Traditionally, mastering Indian Classical music requires years of practice and expertise. Machine learning has been used for diverse applications such as veracity of big data [1–3] and predicting the duration of a cricket match [4]. Given that deep learning frameworks have successfully modeled expert behaviors, we decided to see if they can be applied to model the expertise of Indian Classical musicians with reasonable accuracy. As a first attempt, we decided to use CycleGAN to convert music from one raga to another. Indian Classical music comes in two distinct styles, Hindustani and Carnatic; the former is mainly associated with the northern part of the country, and the latter is primarily associated with the southern region. Our experiments prove that CycleGAN is reasonably successful in converting short snippets of music from one melodic framework to another across the two distinct styles. A raga is a collection of notes arranged in a specific manner to produce a melody. The number and the flow of notes differentiate from one raga to another. Ragas in the Indian Classical music can be specific to the time of the day, such as morning, evening, or night ragas. Therefore, converting a music composition from one raga to another is an interesting idea that we wanted to experiment with, much like the way domain transfers are applied to images [5] and Western music [6]. There are many types of ragas in Hindustani music and Carnatic music; this paper has used Sampurna ragas to start with the basics. Sampurna raga consists of the following seven notes as defined in Indian Classical music: Sa, Re, Ga, Ma, Pa, Dha, Ni, and Sa. A Sampurna raga that follows a systematic pattern of ascending notes, Aroha, and descending notes, Avroha, is known as R. Surana (B) · A. Varshney · V. Pendyala San Jose State University, San Jose, CA 95192, USA V. Pendyala e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_1
1
2
R. Surana et al.
the parent raga. The other ragas do not have all seven notes. We constrained our experiments to parent ragas such as Bhairavi and Shanmukhapriya with a fixed set of seven notes for simplicity. Some of the most commonly known examples of parent ragas are Melakarta raga, Kharaharapriya raga, and more. Another interesting type of raga, known as Khamas raga, does not follow the systemic pattern of ascending and descending notes. Instead, the notes are arranged in a zig-zag fashion, classifying our genre classifier as a bit tricky. This paper attempts to use the existing concept of image style transfer using CycleGAN and to apply it to Indian Classical music. At a high level, domain transfer captures the characteristics of domain A and tries to transform these to domain B, in the absence of paired training samples, where A could be anything ranging from summer image [5], jazz [6], or even Bhairavi raga. We adapted a CycleGAN model [5] to aid us in accomplishing this task. The results show that the transfer is distinguishable and maintains the consistency of the cycle, which conveys that we should arrive at the original raga from the transferred raga. Furthermore, to check the robustness of the model, we also include a genre classifier [6] to distinguish and scrutinize the capability or the performance of the model. We believe that this is the first work that shows an acceptable attempt at transferring the genre of Indian Classical music, ragas, using CNN-based CycleGAN.
2 Related Work Zhu et al. [5] present the pioneering idea of using CycleGAN for image-to-image translation. The authors present an approach for translating an image from domain A to domain B. It uses a pair of generators and discriminators to transfer images from the source domain to the target domain. The authors presented several use cases like converting horse images to zebras, summer images to winter, and more. A few attempts have been made in the same direction for music genre transfer, such as Symbolic Music Genre Transfer [6] and style transfer between drums and bass subgenres [7]. We adopted a similar architecture to apply CycleGAN to Indian Classical music. There has been some work done in the domain of music generation and genre/style transfer. The Google Brain team has been doing considerable research with their Google Magenta Project [8]. Existing works include Symbolic Music Genre Transfer with CycleGANs [6], which introduced an additional discriminator in the CycleGAN model to balance the transfer and to make the model learn many high-level features. Brunner et al. [9] introduced MIDI-VAE, a neural network model based on variational auto-encoder, capable of style/domain transfer by changing a music piece’s pitches, dynamics, and instruments. Brunner et al. [10] also used CycleGAN, but with slightly more attention to normalization and self-attention. Huang et al. [11] described a model, TimbreTron, using CycleGAN that produced high-quality waveforms using a conditional WaveNet synthesizer. Ek and Malik [12] suggested a model, Neural Translation of Musical Style, that plays music more human-like by adding/injecting
1 Deep Learning for Conversions Between Melodic …
3
velocities to the music. Dong et al. [13] proposed a three-GAN framework model, MuseGAN. We have used CNN-based GANs initially introduced by MidiNet [14] for generating melodies. GAN, an application of deep learning, is powerful, but a few technical difficulties include unstable training, non-convergence [15], mode collapse [16], and diminished gradient. In this paper, we adopted the CycleGAN model as a start, and we plan to explore more complex models in the future.
2.1 Contribution To the best of our knowledge, this is the first attempt at converting from one melodic framework to another, a raga in North Indian Hindustani Classical music to a raga in South Indian Carnatic music. It is a bold experiment, given that the music styles of Hindustani and Carnatic are substantially different, but the results are encouraging and may inspire further work in this direction.
3 Dataset and Preprocessing The idea of music genre transfer is derived from the existing concept of image translation [5] and style transfer [6] using CycleGAN. We wanted to go a stride beyond and introduce a new idea of transferring Indian Classical music from one style to another. Just the thought of enjoying one’s favorite raga but in a different style is a whole new feeling in itself. We decided to use two existing datasets derived from the Duniya API [17] and our personal collection to achieve this task. For this research analysis, we use Bhairavi from Hindustani Classical Music and Shanmukhapriya from Carnatic. Bhairavi is raga A and Shanmukhapriya is raga B. The datasets used in our research have the same number of notes, seven, for simplicity and ease for our model to transfer from Bhairavi to Shanmukhapriya, and similarly from Shanmukhapriya to Bhairavi. In addition, the datasets chosen are from two different branches of Indian Classical music. Bhairavi is a type of Sampurna Hindustani Classical music raag. In contrast, Shanmukhapriya is a Sampurna Carnatic music raag. We limited our experiments to Sampurna ragas as variable number of notes in the ragas may introduce more unknowns in our experiments, which we wanted to avoid. Despite having some differences in style, Carnatic and Hindustani music follow the same concept of ragas. We wanted to touch both corners of Indian classical music and test the capability of the model. For the data preprocessing, we follow a multistep process remodeled from Symbolic Music Genre Transfer [6]. The mp3 files extracted are first converted to MIDI format using the Python library audio-to-midi to retain the original content of music as much as possible, that is, tempo, time signature, and note information. Since
4
R. Surana et al.
raga transfer works on note pitches, it is crucial to follow rigorous preprocessing steps to ensure our model captures the features effectively. The converted MIDI files are then merged into a single-track MIDI file. This merging helps keep the samples from the same raga together and retains all the properties of the source raga. We use two Python packages, pretty midi [18] and pypianoroll [19] for the conversion. The Python package pretty midi extracts data from the MIDI file in a format that is easy to amend. MIDI files are first converted to pretty midi format and then translated into multitrack piano rolls. Pypianoroll helps transform the MIDI files into multitrack piano rolls, a matrix representation of music in a time-pitch format. This is the NumPy array that is given as the input to our model. The presence of the pitch is noted at each timestep and is converted to a binary matrix. To minimize the biases due to uneven numbers of ragas in the dataset we initially collected, we downscaled our dataset to 20 sets each for Bhairavi and Shanmukhapriya. In addition, we also removed the tabla tracks from the raga sample because they made the data more cluttered. The final NumPy array that serves as an input to the discriminators has the following structure {batchsize, 64, 84, 1}. We trained the model for 25–30 epochs with a batch size set to 16. Although our model does not perform raga transfer perfectly, we would like to introduce the idea of converting melodic frameworks of Indian Classical music through this paper.
4 Model Similar to “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” [5] our model, as described in Fig. 1, also derives from the recently introduced CycleGAN, which consists of two GANs arranged in cyclic order and trained simultaneously. Each GAN comprises a generator and a discriminator, each of which is a convolutional neural network (CNN). Each discriminator is attached to a generator’s output and is responsible for distinguishing the data generated by the generator from the real data. GANs usually take random noise as input to start with. Since this paper aims to transfer the raga, we input real data, Bhairavi or Shanmukhapriya, to the generator instead of the traditional method of inputting random noise. While the first generator’s job is to transform from Bhairavi to Shanmukhapriya, the second generator is responsible for transforming from Shanmukhapriya to Bhairavi, both working in conjunction with the corresponding discriminator. GeneratorA, GA , takes a sample from the first raga, Bhairavi, as input, and after many iterations, eventually outputs the second raga, Shanmukhapriya, based on the adversarial loss computed by the corresponding adversarial module, DiscriminatorB, DB . We train DB to classify the sample of Shanmukhapriya as a fake or real Shanmukhapriya sample. This generated sample is transferred back to the source raga, Bhairavi, using GeneratorB, GB .
1 Deep Learning for Conversions Between Melodic …
5
Fig. 1 Model architecture based on CycleGAN. The thick arrows denote the transfer from Bhairavi → Shanmukhapriya, while on the other hand, the regular arrows denote the transfer in the opposite direction, that is, from Shanmukhapriya → Bhairavi. At the same time, the thin lines denote the cycle consistency loss functions
Similarly, the second generator, GB , inputs a Shanmukhapriya sample and tries to eventually generate Bhairavi based on the adversarial loss computed by its adversarial module, DiscriminatorA, DA . Likewise, we train DA to classify the sample of Bhairavi as a fake or real Bhairavi sample. This generated sample is transferred back to the source raga, Shanmukhapriya, using GA . Hence, two discriminators are used to determine the plausibility of the generated raga. Our goal is to train a model to learn the mapping of GA (Bhairavi → Shanmukhapriya) and GB (Shanmukhapriya → Bhairavi). The objective function we use to learn the CycleGAN model is the adversarial loss output by the discriminator. This forces GA to produce musical snippets indistinguishable from Shanmukhapriya and GB to produce musical snippets indistinguishable from Bhairavi. The goal of minimizing adversarial loss or optimizing any GAN is to win against the discriminator model by producing a raga sample similar to the source dataset on which the discriminator is trained. Each input sample is classified as either real or fake for the discriminator model and is assigned a probability. We model the parameters of the objective function that maximizes its performance using maximum likelihood estimation. The generator also follows a similar concept, GA samples the Bhairavi dataset and tries to generate Shanmukhapriya samples. We are only interested in updating parameters for the generator, which can be done by minimizing the gradient of the objective function. Since we have two GANs, we have two adversarial losses to compute. Equation (1) represents the adversarial loss for the first GAN and has two terms to it. L(GAN1 )(G A , D B , A, B) = E B∼Shanmukhapriya [log(D B (B))] + E A∼Bhairavi [log(1 − D B (G A (A)))]
(1)
6
R. Surana et al.
The first term, E B∼Shanmukhapriya [log(DB (B))], represents the discriminator’s prediction on real Shanmukhapriya samples. It is evaluated as the log of the discriminator’s output when the input to the discriminator is from the real data distribution. This value is expected to be a high probability value as the discriminator has not seen the fake data generated by the generator, so the high value represents high confidence that the real sample is actually real. On the other hand, the second term E A∼Bhairavi [log(1 − DB (GA (A)))] represents the discriminator’s prediction on the fake data, that is, generated Shanmukhapriya samples from the source Bhairavi samples. The generator model tries to maximize the term DB (GA (A)) to make the discriminator evaluate the fake sample as a real sample with high confidence, while the discriminator tries to minimize it to have the confidence that the fake sample is in fact fake. This acts as a two-player minimax adversarial game [20]. Hence, the discriminator tries to maximize the objective function, and the generator attempts to minimize it. Similarly, adversarial loss for the second GAN can be computed as shown in Eq. (2), which serves as a parameter for the second generator, GB , and the discriminator DB . L(GAN2 )(G B , D A , B, A) = E A∼Bhairavi [log(D A (A))] + E B∼Shanmukhapriya [log(1 − D A (G B (B)))]
(2)
There will be many mappings of generator throughout the training that tries to produce Shanmukhapriya from Bhairavi or Bhairavi from Shanmukhapriya. Independently, the adversarial loss cannot guarantee that Shanmukhapriya samples can be accurately generated from Bhairavi source samples. In order to maximize the likelihood of accurate mappings, the CycleGAN model introduced another loss known as cycle consistency loss [5]. The concept of cycle consistency loss is easy to understand. It makes sure that the original sample can be recovered from the generated output sample after passing the original sample through the generators in a cyclic order, that is, the output of the first generator can be used as the input to the second generator, and the output of the second generator should, in theory, exactly match the original sample. In our case, if we translate from Bhairavi to Shanmukhapriya and then translate Shanmukhapriya back to Bhairavi, then we should ideally arrive at the original result. Cycle consistency loss also acts as a regularizer that makes sure that the original sample remains intact and the generated sample retains the necessary information of the input sample. Similarly, backward cycle consistency loss can also be calculated. Equation (3) represents the cycle consistency loss, where the term GB (GA (A)) denotes the twice-translated Bhairavi samples, and the term A denotes the original Bhairavi samples. For the model to perform with high accuracy, GB (GA (A)) − A should be minimized. L(cycle consistency) = E A∼Bhairavi ||G B (G A (A)) − A||1 + E B∼Shanmukhapriya ||G A (G B (B)) − B||1
(3)
1 Deep Learning for Conversions Between Melodic …
7
We can define the Total Loss as the sum of adversarial and cycle consistency loss as shown in Eq. (4), where lambda (λ) would control the relative importance of the cycle consistency loss. In Eq. (4), we set λ = 10 [5]. Total Loss = L(GAN1 )(G A , D B , A, B) + L(GAN2 )(G B , D A , B, A) + λ(L(cycle consistency))
(4)
5 Experiments Since the CycleGAN model training is highly unstable and computation-intensive, high-power GPUs are required to train the model. Due to limited resources, we could not perform all planned experiments with different hyper-parameters tuning. However, we have done a handful number of experiments and have tabulated our findings. First, we add additional Gaussian noises to the input layer of the discriminator to check the accuracy of our model, similar to [6, 21]. Then, we train our model for 30 epochs with different values of Gaussian noise, σ D = 0, σ D = 1, σ D = 5, added to the input layer of the discriminator of the CycleGAN model. Table 1 shows the various losses (cycle, generator, discriminator) with the variation in the values of Gaussian noise, σ D . For σ D = 1, the cycle consistency loss converges to 0, which indicates that the model converges to a good optimum. Table 2 shows the raga transfer performance of our CycleGAN model. As shown in the table, the model trained with σ D = 1 performs the best. Table 2 also indicates that the model performs significantly well even after adding Gaussian noise to the input. The values in Table 2 represent the probability; for instance, the cycle indicates the model’s performance to recover the original raga, that is, Bhairavi from the transferred raga, Shanmukhapriya. Since the classifier is binary, a low probability of transferred raga is equivalent to a high probability of the target raga shown by transfer, which again confers the model’s success. Table 1 Cycle consistency loss, generator loss, and discriminator loss with different values of Gaussian noise added to the input layer of the discriminator of our model
Gaussian noise (σ D )
0
1
5
Cycle consistency loss
0.09
0.00
0.66
Generator loss
0.77
0.53
1.18
Discriminator loss
0.42
0.49
0.48
8 Table 2 Probability values of raga transfer calculated using a genre classifier
R. Surana et al. Gaussian noise (σ D )
0
1
5
A (origin)
0.8645
0.8549
0.8746
A => B (transfer)
0.4014
0.3029
0.2340
A => B => A (cycle)
0.8665
0.7511
0.4571
B (origin)
0.4823
0.3035
0.1627
B => A (transfer)
0.40072
0.30078
0.6385
B => A => B (cycle)
0.3992
0.3047
0.8729
The probability values indicate the overall success of the CycleGAN model trained with different values of Gaussian noise added to the input to the input layer of the discriminator. A represents Bhairavi, and B represents Shanmukhapriya
6 Results Evaluating the results from the MIDI format could be challenging because only the track information such as notes, pitches, and velocity are retained in the MIDI file but not the actual music. Therefore, we used a well-defined matrix, a genre classifier [6], to check if our model learns and transfers different genres of ragas successfully. The success of the classifier is directly proportional to the success of the raga transfer. Figure 2 shows the learning curve. The plot represents the cycle consistency loss across epochs for different noise values, σ D , added to the discriminator’s input layer.
Fig. 2 Plot of cycle consistency loss versus epoch for training
1 Deep Learning for Conversions Between Melodic …
9
Fig. 3 Pitch versus count
For σ D = 0, σ D = 1, the model tends to have a good/stable learning rate, while for σ D = 5, the model tends to have a high learning rate. Moreover, the same can also be verified from Table 1. We can evaluate the performance of the raga transfer from Fig. 3. In the Bhairavi sample, Fig. 3a, the count decreases with the pitch, while in the Shanmukhapriya sample, Fig. 3b, it varies, and from Fig. 3d, it can also be observed that the pitch varies in the transferred sample, similar to the original Shanmukhapriya sample, Fig. 3b. Thus, it demonstrates that the CycleGAN model achieved the goal of raga transfer to some extent. We also used a sequential heat map to summarize the findings of the generated output. As shown in Fig. 4, the y-axis represents the frequency of tracks in the notes, and the x-axis represents the frequency count, as seen in the color legend in Fig. 4. The scale markings on the axes are linear. The plot shown in Fig. 4a represents a sample from the first source raga, Bhairavi, and the plot shown in Fig. 4b represents a sample from the second source raga, Shanmukhapriya. Figure 4c represents a sample of Bhairavi generated at the end of the cycle. The frequency distribution looks quite similar in Fig. 4a and c. The same can be confirmed from the visual evidence that the frequency count of the underlying note frequency range, 0–50, is very similar in both the plots. This evidence also confirms the robustness of the model. So based on this visual demonstration, we can conclude that the model performs reasonably accurately in recovering the original underlying raga, Bhairavi sample from the generated Shanmukhapriya sample. Furthermore, from the plot in Fig. 5, it can be observed that the transferred sample Fig. 5d starts to learn the features of the Shanmukhapriya raga. For example, it begins
10
R. Surana et al.
Fig. 4 Frequency versus frequency count for the music samples
to learn C5, C6, and C7 octaves, similar to the Shanmukhapriya sample Fig. 5b, while retaining some features from the Bhairavi sample Fig. 5a. The cycle sample Fig. 5c further shows that the model was able to recover with a slight loss, the original Bhairavi sample from the transferred Shanmukhapriya sample, which again demonstrates the robustness of the model.
7 Conclusion Music is deeply rooted in Indian culture, dating back to the Vedic period. It found its place in many spheres of human activity, from social to spiritual life. Melodic frameworks or ragas play an important role in Indian Classical music so much that specific ragas are associated with specific times of the day. Changing the style of a raga is a significant task, even for human beings. In this paper, we presented our experiments at converting Indian Classical music from one melodic framework to another using deep learning. To the best of our knowledge, this is the first attempt at such a conversion of the ragas using a deep learning framework called CycleGAN. Our work is just an initial step in the domain of Indian raga conversion, and the results are encouraging. More complex models to enhance the functionality and to improve the results are open to exploration. In the future, we also plan to develop more comprehensive metrics to test the performance. We hope that the work presented in
1 Deep Learning for Conversions Between Melodic …
11
Fig. 5 Octave versus count for the music samples involved in the experiment
this paper will open up a plethora of opportunities to apply deep learning to enhance the auditory experience of the connoisseurs of Indian Classical music.
References 1. V. Pendyala, Veracity of Big Data. Machine Learning and Other Approaches to Verifying Truthfulness (Springer, 2018). https://doi.org/10.1007/978-1-4842-3633-8 2. V.S. Pendyala, S. Figueira, Towards a truthful world wide web from a humanitarian perspective, in 2015 IEEE Global Humanitarian Technology Conference (GHTC) (IEEE, 2015), pp. 137– 143. https://doi.org/10.1109/GHTC.2015.7343966 3. V.S. Pendyala, Evolving a Truthful Humanitarian World Wide Web (2018)
12
R. Surana et al.
4. S. Tyagi, R. Kumari, S.C. Makkena, S.S. Mishra, V.S. Pendyala, Enhanced predictive modeling of cricket game duration using multiple machine learning algorithms, in 2020 International Conference on Data Science and Engineering (ICDSE) (IEEE, 2020), pp. 1–9. https://doi.org/ 10.1109/ICDSE50459.2020.9310081 5. A.A. Efros, P. Isola, T. Park, J. Zhu, Unpaired image-to-image translation using cycle-consistent adversarial networks, in 2017 IEEE International Conference on Computer Vision (ICCV) (2017), pp. 2242–2252. https://doi.org/10.1109/ICCV.2017.244 6. G. Brunner, Y. Wang, R. Wattenhofer, S. Zhao, Symbolic music genre transfer with CycleGAN, in 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI) (2018), pp. 786–793. https://doi.org/10.1109/ICTAI.2018.00123 7. J. Dambre, T. De Bie, L. Vande Veire, a CycleGAN for style transfer between drum and bass subgenres, in ML4MD at ICML2019, Machine Learning for Music Discovery Workshop at 36th International Conference on Machine Learning (2019) 8. Magenta. https://magenta.tensorflow.org/ 9. G. Brunner, A. Konrad, Y. Wang, R. Wattenhofer, MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer, in Proceedings of the 19th International Society for Music Information Retrieval Confer ence, ISMIR 2018, Paris, France, 23–27 Sept 2018 10. G. Brunner, M. Moayeri, O. Richter, R. Wattenhofer, C. Zhang, Neural symbolic music genre transfer insights, in Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol. 1168, ed. by P. Cellier, K. Driessens (Springer, Cham, 2019). https://doi.org/10.1007/978-3-030-43887-6_36 11. C. Anil, X. Bao, R.B. Grosse, S. Huang, Q. Li, S. Oore, TimbreTron: a WaveNet(CycleGAN(CQT(Audio))) pipeline for musical timbre transfer. arXiv preprint arXiv: 1811.09620 (2018) 12. C.H. Ek, I. Malik, Neural translation of musical style. arXiv preprint arXiv:1708.03535 (2017) 13. H.W. Dong, W.Y. Hsiao, L.C. Yang, Y.H. Yang, Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018) 14. S.Y. Chou, L.C. Yang, Y.H. Yang, MidiNet: a convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847 (2017) 15. J. Abernethy, J. Hays, Z. Kira, N. Kodali, On convergence and stability of GANs. arXiv preprint arXiv:1705.07215 (2017) 16. M.U. Gutmann, C. Russell, A. Srivastava, C. Sutton, L. Valkov, Veegan: reducing mode collapse in GANs using implicit variational learning. arXiv preprint arXiv:1705.07761 (2017) 17. S. Gulati, J. Serr‘a, K.K. Ganguli, S. Sentu¨rk, X. Serra, Time-delayed melody surfaces for raga recognition, in Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), New York, USA (2016), pp. 751–757 18. D.P.W. Ellis, C. Raffel, Intuitive analysis, creation and manipulation of MIDI data with pretty midi, in 15th International Conference on Music Information Retrieval Late Breaking and Demo Papers (2014) 19. H. Dong, W. Hsiao, Y. Yang, Pypianoroll: open source Python package for handling multitrack Pianorolls, in Late-Breaking Demos of the 19th International Society for Music Information Retrieval Conference (ISMIR) (2018) 20. I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014) 21. J. Caballero, F. Huszar, C.K. Sønderby, W. Shi, L. Theis, Amortised map inference for image super-resolution. arXiv preprint arXiv:1610.04490 (2016)
Chapter 2
Content-Based Image Retrieval System Using Fuzzy Colour and Local Binary Pattern with Apache Lucene Nurul Fariza Zulkurnain, Mohamad Asyraaf Azhar, and Moksud Alam Mallik
1 Introduction The technique of searching for and obtaining related images in image databases is known as image retrieval [1]. It is usually divided into two categories: text-based image recovery (TBIR) and content-based image recovery (CBIR) [2]. A new categorization, semantic-based picture recovery (SBIR), has been added by a few experts [1]. SBIR is used to break down any barriers between low-level provisions and the extraction of substantial-level items. The picture recovery in TBIR is based on literary suggestions and does not examine the original image. The data can be in the form of manual comments or logical clues [3]. Content-based picture recovery (CBIR), then again, permits the client to include an inquiry picture rather than text during the looking through measure. There are four main procedures to build a complete CBIR system which are (1) define image descriptor, (2) index feature vectors, (3) define similarity metric, and (4) searching process. Defining image descriptor (i.e. feature extraction) is the core step as it can determine the accuracy and the relevance of the retrieval results. There are two main types of feature extraction methods, which are local features and global features [4]. A significant difference between these two types is that local features are capable of extracting all the interesting points (IP) on the image, while global features examine the entire picture based on its colour, texture, and shape [4]. The two most popular local feature extraction methods are SIFT and SURF which are available in OpenCV library. SIFT uses local extrema, which is based on the difference-of-Gaussian (DoG) operator with a variable value of [5], to investigate the position and sizes of crucial points. SURF uses the same notion of image analysis in Gaussian space [5]. When compared to SIFT, the SURF indicator is based on the determinant of the Hessian N. F. Zulkurnain (B) · M. A. Azhar · M. A. Mallik International Islamic University Malaysia, P.O. Box 10, 50728 Kuala Lumpur, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_2
13
14
N. F. Zulkurnain et al.
matrix and uses fundamental photos to improve highlight detection speed. The main disadvantage of employing local features is that it necessitates a lot of computer power and time because it extracts so much data from the photos. It is also more suitable for object detection, identification, or recognition. In comparison to local features, global features demand less computational resources and the method is simpler [1]. The most straightforward global colour feature extraction method is the RGB colour histogram [6]. However, an RGB colour histogram suffers three main problems. The first one is (1) RGB colour information fails to mimic how humans perceive colour. Human eyes do not see the image in terms of the intensity of red, green, and blue colours. Next, (2) the quantization technique, which is capable of reducing the feature vector size, can lead to different colour which has been categorized into the same bin. Finally, (3) the extracted colour information is too general as many images can have an equal RGB colour distribution. HSV colour space overcomes the first limitation of RGB colour histogram as HSV can separate luma from chroma and closer to human vision [7]. The second problem can be solved by using a fuzzy colour histogram [8], and the third problem can be solved by using a region-based colour histogram method (i.e. bins of colour histogram) [9]. The texture information is the next element that can be extracted from images, and it depends on the value of the second-order statistic [10]. The picture surface properties can be as coarse, contrast, directional, line similarity, consistency, and harshness [10]. Wavelet transform texture feature extracts the directional information of the image, which consists of horizontal, vertical, and diagonal components [11]. However, orientation sensitivity is inferior in this method. Gabor filter texture feature is adjustable to scale and orientation, which makes texture analysis more useful [10]. Local binary pattern (LBP) is another approach that deals with orientation sensitivity [12] by contrasting the picture’s pixel with its encompassing neighbourhood pixel, and LBP can build a local portrayal of surface features [13]. Due to the many methods available for the CBIR system, this paper aims to find one of the best possible global feature extraction methods. The main objective is for a general-purpose image search. Thus, the technique needs to be lightweight and have fast indexing and searching time. A method of fuzzy colour and local binary pattern (FCLBP) is proposed. To speed up the indexing and searching time, an open-source Apache Lucene library is used. Apache Lucene is an open-source library written in Java and utilizes an inverted index data structure for fast searching [8, 15]. The paper is structured as follows: Sect. 2 provides the background on feature extraction; indexing is discussed in Sect. 3; in Sect. 4 described the similarity measurement; experimental results are presented in Sect. 5; and Sect. 6 concludes.
2 Feature Extraction Image descriptor is an image feature extraction technique used to extract all required visual information from the photos. As previously mentioned, this is the major part that determined the accuracy and effectiveness of the CBIR system. The proposed
2 Content-Based Image Retrieval System Using Fuzzy …
15
method used fuzzy colour histogram which combines with local binary pattern texture histogram.
2.1 Fuzzy Colour Histogram Fuzzy colour is an effective method to extract colour information from images. The fuzzy colour system takes HSV colour information as its input. Thus, the RGB colour images need to be converted into HSV colour space by using the following Fuzzy HSB equations: H = cos−1
1 [R 2
− G] + [R − B]
(R − G)2 − (G − B)(R − B)
S =1− V =
(1)
3[min(R, G, B)] R+G+B
(2)
(R + G + B) 3
(3)
Here H represents the hue component that has a range from 0° to 360°. S represents saturation components, and V represents the value or brightness component. Both have a range from 0 to 1. The images also need to be divided into several blocks, usually 1600 blocks. The height and the width for one block can be computed using the formula: step X = √
width total number of blocks
; step Y = √
height total number of blocks
(4)
The fuzzy system consists of two main steps, which are Fuzzy 10 bins and Fuzzy 24 bins [16]. The first system takes the HSV input and categorized the input H, S, and
Fig. 1 Hue membership function
16
N. F. Zulkurnain et al.
Fig. 2 Saturation membership function
Fig. 3 Value membership function
V based on three membership functions, as shown in Figs. 1, 2, and 3, respectively [16]. Hue membership function divides the hue colour wheel into eight regions that represent eight colours—(0) red to orange, (1) orange, (2) yellow, (3) green, (4) cyan, (5) blue, (6) magenta, and (7) magenta to red. Channel S is divided into two fuzzy areas, which determine whether the colour is clear enough to be ranked based on hue membership. The first represents pure grey. Thus, the hue component is neglected and does not significantly affect the output colour. Channel V is divided into three areas. The first area represents the black input colour, and the last area represents the white input colour. The centre region is ideal. Thus, the colour follows the hue membership function. The Fuzzy 10 bins’ systems would output ten colours which are (0) black, (1) grey, (2) white, (3) red, (4) orange, (5) yellow, (6) green, (7) cyan, (8) blue, and (9) magenta.
2 Content-Based Image Retrieval System Using Fuzzy …
17
Fig. 4 Indexing process
3 Indexing Java has an amazing library, which makes indexing and searching much faster, called Apache Lucene. Apache Lucene is an open-source library written in Java for high performance and scalable information retrieval. It can help in making the index and search functionality of an application faster. In Lucene, the data needs to be converted into a textual format (i.e. UTF-8). Lucene document library imported from (org.apache.lucene.document.Document) is used to design a document builder that can convert the extracted feature vector into UTF-8 format. Lucene creates an inverted index for each document stored to help in the searching process. Instead of examining the whole document, Lucene uses the index to find the query information. Thus, the complexity of searching is reduced from O(n) to O(1). A simple analogy of the index concept of Lucene would be the index at the end of a book, which lets the user find relevant pages that discuss certain topics quickly. Then, the Lucene Analyser analyses the document and computes the index that needs to be specified. This research uses Lucene Whitespace Analyser as each value in the feature vector is separated using Whitespace. The index has two implementations, which are read and write. For read access, IndexReader (org.apache.lucene.index.IndexReader) class is imported, while for writing access, IndexWriter (org.apache.lucene.index.IndexWriter) class is imported. Figure 4 shows the indexing processes.
4 Similarity Measurement Four distance metrics have been proposed in [8]. In this research, the L i Manhattan distance is used. The distance function is computed by using the formula: Manhattan or L 1 distance = d(X, Y ) =
N −1
|x(i) − y(i)|
(5)
Here x is the feature vector of the query image, y is the feature vector for the image database, and d(X, Y ) is the distance between query vector x and image database y. The i value ranges from 0 to N − 1, where N represents the vector size. During searching, the FCLBP feature vector from the query image needs to be extracted. The next step is to specify the number of documents that need to be retrieved. The Lucene Analyser needs to be the same as what been used in the indexing
18
N. F. Zulkurnain et al.
Fig. 5 Searching process
process (i.e. Whitespace Analyser). Then, Lucene IndexReader is used to retrieve the relevance feature vector in the database. The image database i is sorted according to its distance with query image in ascending order, which makes the images with a shorter distance to be ranked first. Figure 5 shows the complete searching processes using Lucene.
5 Results and Discussion In this section, results from the methodology are briefly discussed. This section also compares the proposed FCLBP with other methods and established software. Convert RGB to HSV colour space In Fig. 6, the RGB colour information for each pixel is extracted using the Java WritableRaster library. Then, the Java Color library is then used to convert the RGB data into HSV. The result is shown in Fig. 7.
Fig. 6 Image
Fig. 7 HSV channels’ graph
2 Content-Based Image Retrieval System Using Fuzzy …
19
Fig. 8 Output Fuzzy 10 bins and Fuzzy 24 bins colour
Fig. 9 One of the colours that contributes to bin 4 in Fuzzy 10 (HSV: 18°, 40%, 75%)
Figure 7 also shows the distribution of hue, saturation, and brightness on the image. The blue colour represents hue, the green colour represents saturation, and the red colour represents brightness. The two peaks in the hue channel graph come from the bottom of the image (i.e. building and ground) and the rooftop. The high peak of saturation channel at the end is mostly from the building. Lastly, the high spike at the end of the value channel (i.e. around bin 225) comes from the cloud. Output Fuzzy Colour The fuzzy colour is shown in Fig. 8. Fuzzy 10 output shows that most of the blocks been categorized at bin 4, which corresponds to orange colour. However, the orange colour is hardly being found in the image. After analysing through Java programming, it is found that most block that contributes to bin 4 comes from the bottom of the image. The corresponding colour is shown in Fig. 9. The colour is a variation of orange in the hue wheel which can be found at the bottom of the image. Fuzzy 24 bin further evaluates the output from Fuzzy 10. Notice instead of bin 4, the maximum value in Fuzzy 24 is at bins 6, 7, and 8.
6 Conclusion Based on the results, it can be concluded that colour feature alone such as in method 1 (i.e. HSV colour histogram) is not enough to describe images. Using a combination of colour and texture feature such as in method 3 (i.e. FCTH) would lead to a better result. However, traditional FCTH suffered for being invariant to rotation as it used wavelets transform which classify horizontal, vertical, and diagonal component as a different texture. Using local feature such as in method 2 (IOSB SIFT) could solve the rotation problems; however, it is very expensive to be performed especially in the
20
N. F. Zulkurnain et al.
indexing process. Besides, the local feature also is more suitable for object detection instead of general image search. The proposed method FCLBP has been proved to solve the main problem of FCTH. It is invariant to rotation, scaling and can produce a better result compared to other methods. The indexing and searching time also have been proved to be fast when using open-source Apache Lucene library which used inverted index data structure.
References 1. M. Alkhawlani, M. Elmogy, Image retrievals: a survey. Int. J. Comput. Inf. Technol. 4(1), 58–66 (2015) 2. T. Karthikeyan, P. Manikandaprabhu, S. Nithya, A survey on text and content based image retrieval system for image mining. Int. J. Eng. Res. 3(3) (2014) 3. G. Mailaivasan, Parthiban, Karthikram, Tag based image retrieval (TBIR) using automatic image annotation. Int. J. Res. Eng. Technol. 03 (2014) 4. A. Douik, M. Abdellaoui, L. Kabbai, Content based image retrieval using local and global features descriptor, in 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Monastir, 2016, pp. 151–154 5. S.A.K. Tareen, Z. Saleem, A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK, in 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, 2018, pp. 1–10 6. N.S. Sharma, P.S. Rawat, J.S. Singh, Efficient CBIR using color histogram processing. Signal Image Process. 2(1) (2011) 7. D. Soni, K.J. Mathai, An efficient content based image retrieval system based on color space approach using color histogram and color correlogram, in 2015 Fifth International Conference on Communication Systems and Network Technologies, Gwalior, 2015, pp. 488–492 8. M. Lux, O. Marques, Visual Information Retrieval Using Java and LIRE (Morgan & Claypool, 2013) 9. V.H. Vu, Q.N. Huu, H.N.T. Thu, Content based image retrieval with bin of color histogram, in 2012 International Conference on Audio, Language and Image Processing, Shanghai, 2012, pp. 20–25 10. V. Vinayak, S. Jindal, CBIR system using color moment and color auto-correlogram with block truncation coding. Int. J. Comput. Appl. 161(9), 1–7 (2017) 11. R.A. Ansari, K.M. Buddhiraju, Textural classification based on wavelet, curvelet and contourlet features, in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, 2016, pp. 2753–2756 12. Q. Kou, D. Cheng, L. Chen, K. Zhao, A multiresolution gray-scale and rotation invariant descriptor for texture classification. IEEE Access 6, 30691–30701 (2018) 13. O.A. Vatamanu, M. Frandes, M. Ionescu, S. Apostol, Content-based image retrieval using local binary pattern, intensity histogram and color coherence vector, in 2013 E-Health and Bioengineering Conference (EHB), Iasi, 2013, pp. 1–6 14. A.E. Hassanien, K. Shaalan, T. Gaber, A.T. Azar, M.F. Tolba, in Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016 (Springer, Cham, 2017) 15. W. Zhou, H. Li, Q. Tian, Recent advance in content-based imageretrieval: a literature survey. arXiv preprint arXiv:1706.06064 (2017) 16. V. Ljubovic, H. Supic, Improving performance of image retrieval based on fuzzy colour histograms by using hybrid colour model and genetic algorithm. Comput. Graph. Forum 34(8), 77–87 (2015)
Chapter 3
Machine Vision-Based Conveyor and Structural Health Monitoring Robot for Industrial Application Using Deep Learning Khalid, Ahmed Rimaz Faizabadi, and Moksud Alam Mallik
1 Introduction 1.1 Structural Health Monitoring Structural health monitoring is the process of identifying structural damages and utilizing engineering for characterization strategy for the structures. In SHM, the damage of the structure is referred to the material of the structure including its metric properties and the changes in conditions of the boundary and the connectivity of the system which in return adversely impact the performance of the structure. The process of SHM deals with closely observation of deterioration occurring periodically which is justified by the data collected from the sensor/array of sensors, extraction of features from the measurements which are damage sensitive, and then perform statistical analysis of the extracted features which we determine the state of the structure. In the case of SHM during the long term, the information retrieved from the census is periodically updated to keep track of unavoidable degradation and ageing caused by operationalism environment. In case of natural calamities like blast loading or earthquake, SHM is applied for fast conditions screening to provide real time verified information regarding the rectitude of the structure. It is to be Khalid (B) AI Engineer, Nastel Technologies, New York, USA A. R. Faizabadi Department of Mechatronics Engineering, International Islamic University Malaysia, Kuala Lumpur, Malaysia M. A. Mallik International Islamic University Malaysia, Kuala Lumpur, Malaysia e-mail: [email protected] VNR Vignana Jyothi Institute of Engineering & Technology, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_3
21
22
Khalid et al.
noted that the inspection of infrastructure is very crucial as public safety is important concern of the government. Due to the nature of the rapid development phase in technology provided by the data driven fields of Engineering, computer vision, deep learning, and machine learning are increasingly proven to be reliable components for categorizing and determining image patterns which has a forthright approach to applications in inspection contexts [1]. For this problem, in the real world, we have developed a robot that deploys real-world scenarios to identify cracks on the structure and conveyor belt inspector explained in the next paragraph.
1.2 Conveyor Belt Inspector Before the expansion in technology, labour was used to transport material and goods in the manufacturing industry. People were responsible for carrying these products from one place to another using mobile transportation devices. This method of transportation was susceptible to several risks, namely deterioration of products, miss handling, and many more. However, with the advancing technology, new and better ways of transporting goods were employed. Among the new methods, the conveyor belt system is the most cost-effective method of transporting goods and materials in the manufacturing industries. Physical labour requires a workforce and manpower which increase the liabilities. The conveyor belts can reduce the need for physical work, eliminate wages, and grow the company’s profit. Due to their tremendous success, virtually every industry has begun to embody this unique system in their transportation safely from one place to another. They are better and safer than forklifts, and they can be lodged anywhere. They are built so that they can move loads of all weights, sizes, and shapes. Many of the conveyor systems come with vulnerability measures to reduce the risk factor. The conveyor belt system is a modern, practical, and better way of transporting materials and goods from one place to another. They are risk and gamble free and can be easily installed almost anywhere. They are the perfect solution for all your transportation problems. They can transport material, but the problem with the conveyor belt is the accessibility of the belt in the closed tunnels where any problem can arise like items overlapping and causing blockage of the belt movement, etc. [2]. This problem is attained by deploying an autonomous robot that can navigate its way on the belt and find out all possible reasons for stopping the belt and finding out the cause for the impediment and notify the employee as it is difficult for the worker to inspect the belt physically. The rest of the paper has been organized as follows: Sect. 2 provides a literature review. Section 3 gives the required functionality. Section 4 discusses the system design. Section 5 offers technical specifications. Section 6 discusses the detailed implementation of the methodology. Section 7 provides the experiment results. Section 8 contains the conclusion for this work.
3 Machine Vision-Based Conveyor and Structural Health …
23
2 Literature Review The history of robots goes back to thousands of years, and people are fascinated about artificial beings. Starting from ancient Hebrews who about a person called Golem created from dirt and clay for menial labour, in 1400s, mechanical knight was designed by Leonardo da Vinci which acted as a base for or robots. In 1927, Westinghouse built the Televox robot. With the advancement in technology, the computer was referred electronic brain. The first programmable autonomous robot was Unimate. In 1967, a robot was made by General Motors to move pieces of hot metal. In 1966, Shakey the robot from Frankfurt, who was the pioneer autonomous intelligent robot, could perform generic instructions and task like moving the block placed on the table while reasoning and looking around its environment. It identified this block at first and then figured the way to place the block onto the table. In addition to this, the robot was capable of navigating around the room [2]. The work is done in the field of autonomous vehicle shows that almost 50% of the articles have been published between 2018 and 2019 and in 2017 another 21% was published. From this observation, it can be inferred that much of the literature on AVs observed rise in 2015, and after the introduction and successful testing of AV, the market can expect further enhancement in 2022 [3]. The following is the work done under the field of autonomous robots which inspired us to work. This paper works on the robotic behaviour control system for dual dynamics design scheme. The idea of this is that the robotic agent working in different modes leads to call a Tivoli different behavioural pattern [4]. Frederic Kaplan and PierreYves Oudeyer worked for the training of autonomous robots. The work included developing the correct training for animal training in an adapted robotic environment by training the robot. The animal robot is guided to behave correctly performing actions like dig, sit, etc. [5]. This paper puts light on social acceptance of robotics in occupational fields while evolving from single function to multifunction versatile features by including preliminary search of 336 research papers and narrowing it down to 42 [6]. The paper deliberation of autonomous vehicles studies a diversity of open environment robot performs with a variety of task and interaction which requires explicit deliberation to fulfil their task. This paper is meant to end a robot with more adaptability functionality and reduce the deployment cost by proposing five steps which are planning, acting, monitoring, observing, and learning [7]. This paper speaks about issues regarding the development of real-time support for adaptive cruise control. It employs two-level real-time data repository model which updates the derived data when necessary and designing active cruise control with different modes is employing a unique set of characteristics including utilization of capacity compared to other approaches. It utilizes CPU capacity even in demanding times which are required by the system [8]. This paper discusses an automatic merge control system ensuring vehicle safety at road intersections; the problem is optimized with constraints to guarantee safety. With the help of MATLAB Optimization toolbox, existing issues can be determined. Also, Hol approach is presented
24
Khalid et al.
which is computationally less intensive and performance-wise is comparable to optimization approach [9]. In this paper, the increase in intelligence and the number of processes involved resulting in complexity are minimized by using computing power required without affecting the power performance and safety of the application. A safety–critical application called automatic merge control (AMC) which enables safe manoeuvring in the region of collision is proposed using two main algorithms called the head of the lane Hol and all feasible sequences (AFS). Due to the setback of required technology under development, deploying such a system is not feasible [10]. Resource management for a real-time task in mobile robotics paper discusses task scheduling and allocating in the distributed environment by providing a framework for scheduling and allocating periodic task with communication and precedence constraints in a distributed dynamic environment such as a mobile robot system. The algorithm provides a simple but efficient scheduling and allocation scheme which can enable the developers to design a predictable distributed embedded system even in the variety of resource and temporal constraints [11]. To avoid human errors while driving in traffic, the concept of autonomous intelligent cruise control has evolved. The paper proposes an AICC system for an automatic vehicle, examining its effect on traffic flow and comparing its performance to the driver. The autonomous intelligent cruise control is non-cooperative which means that it does not interact with other vehicles and still be non-susceptible to slinky and oscillation effects. By designing the control system appropriately, it achieved safety distance separation rule which is proportional to vehicle velocity. The system’s performance is superior to that of humans with faster transient respond leading to smoother and faster traffic flow [12]. The paper discusses the implementation of line follower using a robot which has IR sensors installed under the robot. Generally, the path is predefined and can be visible over a black surface with high contrast colours or it can be invisible like a magnetic field. The data is transferred by the processor to specific transition buses, and the processor decides proper commands and sends them to the driver and makes the robot follow the white line. In this paper, structural health monitoring has been traditionally uses visual inspection and trained inspectors and assessments conducted and using contact sensors on the structure for monitoring which is costly and inefficient because of the number required. A robot equipped with proper sensors and GPS for autonomous flight is popular these days, but GPS functionality is hindered in places such as under the bridge due to the structure so this difficulty is replaced by ultrasonic beacons and the role of GPS and deep CNN for damage detection and geotagging method for localization of damage [13]. The paper titled potential of autonomous us and automated image analysis for structural health monitoring employees. An AscTec Falcon 8 with full digital camera is used for photo grammatical service. The UAS image analysis and photogrammetry proposed are demonstrated, and improvement is required for in case of complex objects [14]. The work done by Maw-Huei Lee, Brecksville, Ohio on conveyor belt monitoring system provides a system which detects longitudinal rip in conveyor belt producing warning signals to stop the operation of the system. It includes a series of sensors embedded into the built-in a spaced relationship along the length of the belt and alarm circuits in series positioned at regular intervals along the length of the belt. If the belt rips, a parts of the loop of
3 Machine Vision-Based Conveyor and Structural Health …
25
wire in one of the circuits breaks the electrical circuit of the sensor and the alarm is triggered [16]. If the belt rips, in this article, the problems faced by conveyor belt is attained by proposing novel approach on unmanned aerial vehicles integrated with thermal imaging camera in which the results indicate that signal processing technique is sufficient to identify failures in the roller automatically. The implemented backend platform enables cloud and field connectivity with the system [16].
3 Organization of the Paper The rest of the paper has been organized as follows: Section 1 gives the introduction. Section 2 provides literature review. Section 4 gives the required functionality. Section 5 discusses the system design. Section 6 gives technical specifications. Section 7 discusses detailed implementation of methodology. Section 8 provides with the experiment results. Section 9 contains the conclusion for this dissertation work. References and appendix have been included at the end of the paper.
4 Required Functionality Movement: The robot should be able to manoeuvre surpassing the obstacles if any. Crack measurement (SHM): It should be able to provide evidence of the cracks on the structure. Obstacle measurement (CBI): It should be able to identify obstacles on the belt. Functionality: The robot should be able to detect structure cracks and obstacles easily.
5 System Design To obtain the following, above-mentioned Firebird V is used along with camera module on Raspberry Pi 4 to capture the images for crack detection in case of structural health monitoring and ultrasonic sensor in case of conveyor belt inspector. Figure 1 shows image of Firebird V with ultrasonic sensor used for conveyor belt inspector and structural health monitoring.
26
Khalid et al.
Fig. 1 Firebird V with ultrasonic sensor
6 Technical Specification The robot is fully autonomous in a prescribed region mentoring navigation algorithm to move from point a to point b of the structure. While doing so it captures images of the structure which is to be utilized for analysis of crack detection using a deep learning algorithm running on edge device. The robot consists of three white line sensors which implement line follower, five sharp IR range sensors, and eight analogue IR proximity sensors for object detection. The camera module embedded on Pi4 is attached to Firebird V which detects cracks and documents the images and its location which are later used for identifying the cracks on the structure by the human [2]. Table 1 shows hardware of the robot in detail. Deep Learning Identifying and recognition of classes in given dataset fall under the category of classification problem in machine learning. To achieve this problem of understanding and categorizing an object in an image, deep learning is extensively used. Among many deep learning models like convolutional neural networks (CNNs), rectified neural network (RNN), deep belief network (DBN), and long short-term memory (LSTM). We used convolutional neural network as its very useful for classification and recognition task. ResNet18 Due to its popularity of achieving excellent generalization performance for recognition task, residual neural network also known as ResNet is used for image classification, i.e. crack detection. We have selected ResNet18 for our task. However, there are complex models like ResNet34, ResNet50, and ResNet101. The pivotal reason to choose ResNet18 is that it can be used without GPU utilization on low-cost edge device like Raspberry Pi 4. The framework chosen is PyTorch which is an open-source library for computer vision and natural language processing. The ResNet18 is classification model for images which is trained on ImageNet dataset. ImageNet is free database by WorldNet
3 Machine Vision-Based Conveyor and Structural Health …
27
Table 1 Technical specifications of Firebird V S. No. Parameters
Specifications
1
Microcontroller It has a master microcontroller called Atmel ATmega 2560 and slave microcontroller Atmel ATmega 8
2
Sensors
There are three white line sensors for the robot with five sharp infrared range sensors and a total of eight analogue IR proximity sensors with two-position encoders including ultrasonic range sensor
3
Indicators
It contains 32-character LCD screen, one buzzer, and indicator LEDs as well
4
Control
The robot is autonomously controlled PC as master and has slave microcontroller with wired or wireless mode
5
Communication Wireless Zigbee communication (2.4GHZ) and Wi-Fi communication with Wi-Fi module
6
Dimensions
The diameter of the robot is 16 cm with the height of 8.5 cm and weight of 1.1 kg
7
Power
It has auxiliary power coming from on-board battery monitoring including intelligent battery charger. Also 9.6 V nickel metal hydride (NiMH) battery on-board
8
Battery life
Battery life is up to 2 h, with 75% of the time the motors are operating
9
Locomotion
The position encoder of 30 pulses/revolution and wheels turning diameter of 51 mm is used for locomotion A caster wheel and 2 DC geared motors aided with differential configuration and a speed of 24 cm/s
containing thousands of images. It aids computer vision and deep learning research. The model expects normalized three-channel RGB images of shape (3 × H × W ). The load of images is expected to be [0, 1]. Then it expects to be normalized using std = [0.229, 0.224, 0.225] and mean = [0.485, 0.456, 0.406]. ResNet18 contains 18 layers with Top-1 error: 30.24 and Top-5 error: 10.92. The later versions like ResNet34, ResNet50, ResNet101, and ResNet152 have layers 34, 50, 101, and 152, respectively, and less Top-1 and Top-5 error compared to previous versions respectively. Figure 2 shows the working of deep learning model. Raspberry Pi 4 For portability and running deep learning model, we have utilized Raspberry Pi 4 which has RAM of 8GB and Arm v8 running at 1.5 GHz. The dataset is stored in 64GB micro SD card supported by Pi 4. The camera module “Raspberry Pi HQ camera” captures clear pictures with 12 megapixel resolution. In Fig. 3, Raspberry Pi 4 is mounted on Firebird V. Dataset A custom dataset of 1500 images of both cracked and uncracked images are taken which are further augmented to increase the size of the dataset. It contains labelled images of cracked and uncracked surfaces to train the deep learning model. Figure 4
28
Khalid et al.
Fig. 2 Working of deep learning model
Fig. 3 Raspberry Pi 4
contains snapshot of uncracked images, and Fig. 5 contains snapshot on cracked images.
Fig. 4 Uncracked images fed to train model
3 Machine Vision-Based Conveyor and Structural Health …
29
Fig. 5 Cracked images in dataset
7 Method 7.1 Structural Health Monitoring Figure 6 represents the flow of operation for crack detection. Once power is supplied to the robot, the sensors on the board are initialized and threshold value is set according to the program. The robot makes its way from one point to another and captures cracks on the structure using the camera module. This data is received at the user end by the Zigbee module, and image is analysed for crack detection using pretrained deep learning model on Raspberry Pi 4. To train the model, ResNet18 is utilized as it requires low GPU. We have developed a custom dataset containing cracks and uncracked with 8GB RAM of Raspberry Pi 4. The master microcontroller Atmel ATmega 2560 receives data from the IR sensor and decides its course if the obstacle hinders its path; otherwise, the microcontroller receives the data from white line sensors which deploys line follower algorithm for navigation and the implementation of line follower is seen in Fig. 8. Figure 7 shows crack detection. The camera mounted on robot captures the cracks on the structure by employing the deep learning model. After training with custom dataset, the model identifies the cracks. The robot then notifies the user with the obtained result (Fig. 8).
Fig. 6 Flow of system for structural health monitoring
30
Khalid et al.
Fig. 7 Crack detection
7.2 Conveyor Belt Inspector Figure 9 represents the flow of operation for obstacle detection. Once power is supplied to the robot, the sensors on the board are initialized and threshold value set according to the program. The robot makes its way from starting of conveyor belt to another end of belt and captures any form of obstacle on the belt which stops the moving of conveyor belt using the camera module. Once an obstacle is identified by ultrasonic sensor, the buzzer is activated and the coordinates are noted down and displayed on the LCD screen. The same is sent to the admin using Zigbee. The master microcontroller Atmel ATmega 2560 is master controller which is autonomous control PC receiving data from sensors. The IR sensor decides its course if the obstacle hinders its path; otherwise, the microcontroller receives the data from white line sensors which employees line follower algorithm for navigation, and the implementation of line follower is shown in Fig. 8. Ultrasonic sensor functioning The working of ultrasonic sensors is similar to that of bat. It emits sound waves, and upon the sound waves colliding the object, the waves are received by the ultrasonic sensor. The sound waves incident on the object are reflected where diffused energy takes place over wide solid angle as high as 180 degrees. This reflection of some fraction of incident energy is received by transducer in the form of echoes. Depending on the distance between the sensor and the object, reflection of sound waves occurs. In case of the object located at larger distance, the sound waves consume more time to return, and if the object is in short range, the sound wave reaches the sensor faster. If the distance between object and sensor is very long, then the sound waves coming back are weak, and the object is failed to be recognized [15]. Ultrasonic HC-SR04 is a common sensor used for contact less distance measurement of range 2–400 cm. The hardware part of ultrasonic sensor consists of four pins: (1) input power, (2) ECHO—echo output, (3) VCC—5 V, and (4) GND—Ground A triggers the input, with a HIGH signal of 10 µS at least. This creates a window for the module to transmit 8 ultrasonic bursts of 40 kHz. The working is initiated when there is an obstacle in front of the module causing reflection of the sound wave. This
3 Machine Vision-Based Conveyor and Structural Health …
31
Fig. 8 Line follower algorithm
prompts the signal to come back causing ECHO output of the ultrasonic module to be HIGH for the duration of time taken for sending and receiving the ultrasonic signal. The range of pulse width varies between 150 µS and 25 mS. The exact value depends on distance between sensor and obstacle. In case there is no obstacle, it will consume 38 ms to calibrate [3].
32
Khalid et al.
Fig. 9 Flow of system for conveyor belt inspector
8 Results The findings of structural health monitoring experiment are in Table 2 with error percentage reduced below 3%. The results of conveyor belt inspector are shown in Table 3, and it represents the accuracy of the robot to identify the obstacles (Figs. 10 and 11). Table 2 Findings of structural health monitoring
Table 3 Findings of conveyor belt inspector
S. No. Actual crack (mm) Identified crack (mm) Error (%) 1
0.8
0.84
2.8
2
1.21
1.226
1.3
3
1.31
1.326
1.2
4
1.51
1.52
0.6
5
1.9
1.921
1.1
6
4.5
4.561
1.3
S. No.
Actual distance (cm)
Measured distance (cm)
Percentage error (%)
1
2.9
2.92
0.6
2
3.84
3.86
0.52
3
6.44
6.49
0.77
4
7.82
2.9
0.89
5
9.74
9.98
2.46
6
11.21
11.19
0.17
7
13.21
13.22
0.07
8
15.7
15.8
0.63
3 Machine Vision-Based Conveyor and Structural Health …
33
Fig. 10 Line graph representing results of observations made for structural health monitoring
Fig. 11 Line graph representing results of observations made for conveyor health inspector
9 Conclusion and Future Scope In this research, the gap in existing work done on semi-autonomous field where the robot captures the whole environments images/video and sends it back to human for further analysis has been automated. Our work has successfully automated the task by analysing cracks automatically without human intervention and recorded them with low-cost hardware. We successfully utilized edge device named Raspberry Pi 4 to detect cracks on the structure in a real-time environment using ResNet18 and deployed the deep learning model of Pi 4 on top of Firebird V. The Firebird V deploys cruise control and autonomous navigation algorithm for moving the robot from point a to b. The robot was successfully able to identify the cracks without human intervention, and it also performed conveyor belt inspection by navigating on a conveyor belt and identifying the obstacles on the belt. This paper presents a novel approach for intelligent sensor-based inspection which was tried and tested to measure the crack on the surface for SHM and obstacle for CBI with the accuracy of up to 95%. However, further development of the algorithm employed is a necessary future scope of the work including making the robot amphibious for structure on the water for SHM and better efficiency for CBI.
34
Khalid et al.
References 1. V. Giurgiutiu, Structural health monitoring with piezoelectric wafer active sensors 2. Z. Yiyang, The design of glass crack detection system based on image pre-processing technology, in Proceedings of Information Technology and Artificial Intelligence Conference, 2014, pp. 39–42 3. M. Alawadhi, J. Almazrouie, Review and analysis of the importance of autonomous vehicles liability: a systematic literature review 4. H. Jaeger, T. Christaller, Dual dynamics: designing behavior systems for autonomous robots 5. F. Kaplan, P.-Y. Oudeyer, Training of autonomous robots 6. N. Savela, T. Turja, A. Oksanen, Social acceptance of robots in different occupational fields: a systematic literature review 7. F. Ingrand, M. Ghallab, Deliberation for autonomous robots: a survey 8. G.R. Goud, N. Sharma, K. Ramamritham, S. Malewar, Efficient real-time support for automotive applications: a case study 9. G. Raravi, V. Shingde, K. Ramamritham, J. Bharadia, Merge algorithms for intelligent vehicles 10. V. Shingde, G. Raravi, A. Gudhe, P. Goyal, K. Ramamritham, Merge-by-wire: algorithms and system support 11. H. Li, K. Ramamritham, P. Shenoy, R.A. Grupen, J.D. Sweeney, Resource management for real-time tasks in mobile robotics 12. P.A. Ioannou, C.C. Chien, Autonomous intelligent cruise control 13. D. Kang, Y.-J. Cha, Autonomous UAVs for structural health monitoring using deep learning and an ultrasonic beacon system with geo-tagging 14. J. Kersten, V. Rodehorst, N. Hallermann, P. Debus, G. Morgenthal, Potentials of autonomous UAS and automated image analysis for structural health monitoring 15. M.-H. Lee, Conveyor belt monitoring system, Brecksville, Ohio 16. R. Carvalho, R. Nascimento, T. D’Angelo, A UAV-based framework for semi-automated thermographic inspection of belt conveyors in the mining industry. Sensors 20(8), 2243 (2020)
Chapter 4
Design and Development of Marathi Word Stemmer P. Vaishali Kadam, B. Kalpana Khandale, and C. Namrata Mahender
1 Introduction Today’s world is the world of the Internet; very large or unlimited information is available for the users. But mostly due to lack of time, one cannot see all the information and only wants to access the required limited data or information in such applications; text processing tools are helpful to access such required data accurately. Also due to the unavailability of NLP tools for these languages, access to data is limited. Almost all text processing tools are based on natural language processing. It is the subfield of computer science and artificial intelligence. NLP is more accurately the conversion of human communication language to Machine learning language. Today, all the AI-related applications are intended to train a machine and making it capable to understand human language as equivalent, more natural to a human understanding, and thought to be processed as well to its perception equivalence. As the data is very bulky, many times the same word can appear with its different form within the text document. These are the extensions used with a word to its basic root form called the stem. In natural language processing (NLP) applications like text summarization, parsing, chunking, word sense disambiguation, and partof-speech tagging, text preprocessing performs an important role because for some Indian languages like Hindi, Marathi, and Konkani use Deonagri scripts, and words appeared repeatedly with different variants which have extensions of some suffixes. When we remove such affixes, we got its basic root word which generates a specific same or closer meaning for all its variants. For designing an NLP application, it is
P. Vaishali Kadam (B) · B. Kalpana Khandale · C. Namrata Mahender Department of Computer Science & I.T., Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra, India C. Namrata Mahender e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_4
35
36
P. Vaishali Kadam et al.
very much required to have the data in its standard form. To maintain its standard and uniqueness in the language format, a stemmer performs a vital role.
1.1 Need of Stemmer in NLP Application Stemming is currently in the development stage for various Indian languages. Most of the work that has been done was one of the rule-based and other empiricalbased. The rule-based system requires proper knowledge of that language with a set of handwritten rules from the expert. Naïve researchers faced many difficulties or problems in writing proper linguistic rules based on grammar because of the morphological effect of the Indian languages. Also, it is more difficult for a naïve user to understand language or word variants. It is the most challenging task and difficult task for the Indian languages those using the Deonagri script. To avoid complex structures and have a standard for text processing, stemmer performs an important role. It makes available words or languages with their original base forms. Stemming is becoming a challenging problem in the field of NLP with the basic preprocessing requirement of any NLP application.
1.2 Need of Stemmer for the Marathi Language There is other Indic language such as Hindi; various stemmers are available. They give better performance for the preprocessing. Marathi is a morphologically rich language and also complex to its understanding and generating its meaning due to its structure and syntax. It is a free-ordered language. It means irrespective of standard format, the order of sentence formation is changed every time by its user or speaker’s requirement. Many a time for text preprocessing, data is not available in a unique pattern. Due to the unavailability of a standard format or pattern, it is a challenge for any text processing application to maintain a specific format for the text. In the case of the Marathi language, very little work has been done on stemming. To design any NLP application, stemmer is important. There is a requirement of a stemmer which has given better accuracy and performance matric. This need for a tool of stemmer has inspired the new researchers and industry persons toward the development of efficient systems for information retrieval, text summarization, text classification, opinion mining, data mining applications, and clustering for the regional languages. The inflectional nature of a language caused several challenges for the automated text processing of natural language data [1]. For the Marathi, stemming is a challenging research problem. The Marathi language is a complex language. It has many complex structures. It has various morphological variations. Words are appeared with different structure formats to express different language notations in the Marathi language. A single base word can have different variants can be used as , like
4 Design and Development of Marathi Word Stemmer
37
, etc. For writing such variants of the words, the Marathi language follows its own grammar rules with a different structural form to generate the meaning. In a language, every word is composed of a base or root with added suffixes or extensional characters. In some cases, it is difficult to remove the suffixes, and when a suffix is removed, a new word is formed with a change in its meaning.
1.3 Stemming Stemming is an important part of computational linguistic analysis and the study of language morphology. It is also helpful to understand the natural language and natural language processing in machine learning. Stemming can be used to improve the overall performance of the information retrieval systems and various NLP applications. It is a process of automatic removal of grammatical suffixes coming with variants of a base word in the text. In inflection or the process of word formation in which a base or root word is modified and used to express different grammatical meanings, categories for the language such as to specify a tense (past, present, future), voice (active, passive), a person (first person, second person, etc.), number, gender, mood, and definiteness. Stemming is broadly classified into two different types: supervised and unsupervised. Supervised systems have their predefined rule set for the language corpora or text. The unsupervised systems are capable of automatically removing the suffixes by understanding the language with different contexts. Stemming uses a number of approaches to reduce the suffixes of a word to its base. Whenever an inflected form is met, it is utilized. Based on the further classification, which approach is used for stemming, it has been categorized into different subtypes. Table 1 shows stemming techniques in brief.
1.4 Rule-Based Stemmer The rule-based stemming requires proper knowledge of the language and grammar rules. Researchers faced a great problem writing proper linguistic rules due to the morphological effect of Indian languages. It is the main issue in the case of inflectional languages. For this reason, Stemming is becoming a challenging problem for study in the field of NLP. All this information is coded in the form of a rule. The function of the stemming system is divided into two different steps. Step 1—It uses a dictionary to assign each word a list of suffixes. Step 2—It uses a large set of hand-written grammar rules used to remove suffixes to search for an appropriate stem word. The rule-based stemming system extracts suffix stripping rules using different word formation structures in Marathi.
38
P. Vaishali Kadam et al.
Table 1 Stemming techniques Technique
Feature
Advantages
Lacuna
Example
Rule-based or linguistic-based
It uses language rules or grammatical and morphological analysis aspects of a language
Implementation is easy, it requires less memory storage, domain dependent
Manually designed language rules are the requirement, need of language expert, time-consuming, less accuracy
Lightweight stemming, linguistic removal with corpus, longest match
Statistical-based
It uses statistical aspects of language for finding root word
It has the ability to work with exceptional cases, and is fast, useful for different languages
More storage requirements, accuracy depends on dictionary size
N-gram methods, co-occurrence analysis, clustering
AI and neural network-based
It uses genetic algorithm and AI techniques with fuzzy logic
Applicable for a wide variety of languages
Requires large Neural network, corpus to train the genetic system takes algorithm more processing time, requires a significant amount of memory
Hybrid
It uses simple POS-tagging with statistical and linguistic approach together
Combination of more than one technique overcomes the drawbacks of each other
If dictionary look-up is combined, it requires extra overhead for storage
Combined techniques, brute force algorithm
Below format gives idea of the structure. (a) (b) (c)
Simple word format—In this, words are formed with word = (root word + plain suffix) Join word format—In this, words are formed with word = (root word + join suffix) Complex word format—In this, words are formed with (1) (2) (3) (4)
Word = (root word + complex suffix) Word = (root word + plain suffix + join suffix) Word = (root word + plain suffix + complex suffix) Word = (root word + plain suffix + join word + complex suffix).
There are three different types of suffixes used in the composition of word structures.
4 Design and Development of Marathi Word Stemmer
39
Table 2 Simple/plain suffixes in Marathi
(a)
(b)
(c)
Simple suffixes—A single character is the suffix that comes with the root word. specified with the Simple suffix consists of any of this language notation or symbol (Table 2). For example, Joined suffixes—Two characters come in join. Words are come up with suffix like For example, Complex suffixes—Two or more characters come in join word with suffix like For example,
2 Literature Survey Stemming is difficult in the Marathi language. There are various stemmer tools available for English and other foreign languages. Much work has been done on these languages. In the early days of stemming, it was studied only for the English language. Lovin’s stemmer is the oldest stemmer in English, and others are Porter’s stemmer, Dawson stemmer, Paice, and Husk stemmer. Some non-English stemmers were introduced during 1990–2000, but studies related to stemming for Indic languages started after 2000. Table 3 shows the related study for the various stemmers available for Indic languages.
3 Database For the database, we created our own database corpus in Marathi text or Deonagri script. We have used 30 moral stories which are available online. From the 30 different stories, we collected 5000 words in the form of dictionary. For our proposed work, we have trained our system with a total of 5000 words. Table 4 gives an idea of a sample of the stories.
4 Proposed System Figure 1 shows the architectural view of the rule-based stemming system for Marathi. There are eight types of words found in Marathi, namely noun, verb, adjective, adverb,
40
P. Vaishali Kadam et al.
Table 3 Review on stemming for Indian languages Sr. Author No.
Language
Technique and advantages
Lacuna
Performance
1
Md. Redowan Mahmud et al. 2014 [2]
Bengali
Rule-based stemmer. Algorithm eliminates inflections stepwise to form the desired root in the dictionary
Limited for words For verb containing conjugated accuracy is 83% letters, only verb, and for noun 88% noun inflections
2
Lakshmi et al. 2014 [3]
Language Stemming independent algorithms are reviewed for Indian and non-Indian languages
Specify a need to develop a language-independent stemmer for all languages
Review study concentrates on important issues related to stemming process
3
Upendra Mishra et al. 2012 [4]
Hindi
It reduces the problem of over-stemming and under-stemming for a lightweight stemmer for Hindi. Use of brute force algorithm with suffix removal
Lack of human interference in the final result, a limited dataset
The accuracy of 91.59%
4
Dalwadi Bijal et al. 2014 [5]
Hindi
Stemming algorithm overview has been given for Indian and non-Indian languages; it is noticed stemming increases the retrieval results for rule-based and statistical methods
Require constant updates to dictionaries due to language evolution
Study concludes stemming reduces the size of index files as the number of words to be indexed is reduced to common forms
(continued)
4 Design and Development of Marathi Word Stemmer
41
Table 3 (continued) Sr. Author No.
Language
Technique and advantages
Lacuna
Performance
5
Ananthakrishnan Hindi Ramana-than et al. 2019 [6]
Lightweight stemmer for Hindi, use of suffix removal
The derivational morphology of Hindi not concentrated
Efficient stemmer for information retrieval systems. Under-stemming and over-stemming error calculated to be 4.68 and 13.84
6
Shahidul Salim et al. 2019 [7]
Bengali
Rule-based stemming
Time and space complexity
Efficient algorithm accuracy is close to 100%
7
Sandipan Sarkar et al. 2012 [8]
Bengali
A rule-based stemmer
Domain dependent
The accuracy of the tool reached up to 96%
8
Debasis Ganguly Bengali and Rule-based et al. 2012 [9] Hindi stemmer, removing classifiers and case markers in Bengali
Manually crafted rules for removing the plural suffixes for Hindi and Bengali
Rule-based stemmers for Hindi and Bengali yielded the best-performance gains in IR
9
Tapashee Tabassum Urmi et al. 2016 [10]
Bengali
A contextual similarity-based approach using the N-gram language model
It requires more information about Bangla word formation patterns
It has given the accuracy of 40.18%
10
Jikitsha Seth et al. 2014 [11]
Gujarati
Rule-based stemmer along with statistical approach
Errors observed due The accuracy of to over-stemming and the stemmer is under-stemming 92.41%
11
Anup Kumar Barman et al. 2016 [12]
Assamese
Look-up and rule-based suffix stripping method for the Assamese language using WordNet
Limited stemming rules and dictionary size
The accuracy of the system is 85%
12
Juhi Ameta et al. Gujarati 2012 [13]
A rule-based stemmer of Gujarati
86% of the total errors was due to over-stemming and 14% of the total errors were due to under-stemming
The accuracy of stemmer is 91.5%
(continued)
42
P. Vaishali Kadam et al.
Table 3 (continued) Sr. Author No.
Language
Technique and advantages
Lacuna
Performance
13
Dinesh Kumar et al. 2011 [14]
Punjabi
Brute force technique with suffix stripping approach
It consumes a lot of time
The accuracy of stemmer is 81.27%
14
Vishal Gupta et al. 2011 [15]
Punjabi
A rule-based approach for Punjabi language noun and proper name stemming
Errors are found due to rules’ violation or dictionary errors or due to syntax mistakes
The efficiency of noun and proper name stemmer is 87.37%
15
Prabhjot Kaur et al. 2019 [16]
Punjabi
Punjabi verb stemmer (PVS) with a rule-based approach
Accuracy depends on the list of root words and the rules created to remove suffixes
The overall accuracy of stemmer is 95.21%
16
N. Pise et al. 2018 [17]
Marathi
A rule-based Over- and approach using under-stemming Marathi corpus, observed stop word list and suffix stripping rules
86% accuracy
17
Pooja Pandey et al. 2016 [18]
Marathi
Rule-based approach with WordNet for stemming Marathi words using name entity and stem exception dataset
Exact accuracy not calculated
The problem of over- and under-stemming is reduced proportionally using root verification
18
Majgaonkar et al. 2010 [19]
Marathi
Unsupervised and rule-based approach
Over-stemming and 80.7% for under-stemming exist rule-based 82.5% and cause for error for unsupervised
etc. and in the language they appear in different forms with variations in their structure to express a specific language instance or notation. Marathi is free ordered in its morphological structure. For Marathi, the order is of sentence formation in general as Subject + Object + Verb (SOV), which is more difficult. All the rules used for stemming are based on Marathi transformational grammar which has helped us to generate the stem of each word correctly.
4 Design and Development of Marathi Word Stemmer Table 4 Sample of the stories
43
Story sample
Total words in a story
S1
391
S2
110
S3
079
S4
243
S5
244
S6
210
S7
144
S8
191
S9
387
S10
235
Fig. 1 Architectural view of proposed system
4.1 Input Marathi Text For input, we have concentrated on the moral stories of the Marathi language text.
44
P. Vaishali Kadam et al.
4.2 Preprocessing In preprocessing, we tokenize the sentences and the words individually from the input text.
4.2.1
Sentence Tokenization
Sentence tokenization is the basic process of splitting text corpora. The discourse is divided into a number of individual sentences separated by a comma. The following example shows sentence tokenization. Input sentence— Tokenized sentence—
4.2.2
Word Tokenization
After sentence tokenization, we have to tokenize the sentence or split the words individually from the sentence. Sentences are split into the number of individual words or tokens from input text. This process is called word tokenization. For example, Input sentence
Tokenized words
4.3 Rule-Based Stemming After tokenization, we have rules for stemming. For each tokenized word, the system generates a stem after analysis of its suffixes with the help of a predefined dataset and rules from the word dictionary.
4.3.1
Rules for Stemming
There are eight different types of relational words found in Marathi, and they have their respective subtypes. Every sentence is a statement in the language. For every sentence verb or action, the verb is the principal word. Actions are performed by the subject of the word. These words express the relationship between nouns and verbs and relationships with other existing words appearing in a sentence. Stemming
4 Design and Development of Marathi Word Stemmer
45
Table 5 Vibhakti Pratyay’s (character suffixes) in Marathi
Sr. No.
Vibhakti
Singular word Suffix word form
Plural word Suffix Word form
िव मुल
1 3 5
मुले
स ला ते
मुलास मुलाला
स ला ना ते
मुलास मुलाना
तृतीया
ने ए शी
मुलाने मुलाशी
नी शी ई ही
मुलानी मुलाशी
चतुथी
स ला ते
मुलास मुलाला
स ला ना ते
मुलास मुलाना
पंचमी
ऊन
मुलाऊन
ऊन
चा ची चे
मुलाचा मुलाची मुलाचे
चे
ची
मुलाचे मुलाचे
तई आ संबोधन
मुलात
त ई आ
मुलात
मुला
न
मुलान
requires syntactic and semantic knowledge of languages to extract root words or stem words by removing suffixes from the given word. Many times, words do not come in their original base form; they come with different variations in their structure with in Marathi [20, 21] (Table suffixes. This is based on Vibhakti Pratyay’s 5). The knowledge-based rules have been designed for stemming each word in the text. Table 6 shows different rules for obtaining the stem word. Some rules are applied for suffix removal, and some rules are applied after the removal of the suffix to get a meaningful base or stem word.
4.4 Output Marathi Stem Word Finally, individual words are stemmed out by removing the suffixes as a result. After stemming of the word with the help of the grammar rules, the output is shown as in Table 7.
5 Result For the proposed work, we trained our system with the handwritten rules set and the word dictionary to develop the stemmer for the Marathi language text. From the
46
P. Vaishali Kadam et al.
Table 6 Sample rules for stemming the Marathi text
Sr. Rule No.
Example
Description
Root (base word + suf- word fix)
1
If word ends with suffix (स ) मामास काकास Then remove suffix (स) from that word
2
If word ends with suffix (ला)
मामाला सीताला
Then remove suffix (ला ) from word If word ends with suffix (ते) Then remove suffix (ते) from word If word ends with suffix ना Then remove suffix (ना) from word
सांगते दाखवते
5
If word ends with suffix Then remove suffix from word
घरा
6
If word ends with suffix
आपलयाकडून शहराकडून
3
4
कडून
7
8
मामा स काका स
मामा काका
मामा ला सीता ला
मामा सीता
सांग दाखव
साहेबाना
साहेब ना
जंगलाम
Then remove suffix कडून from word If word after removal of suffix झाडावर word ends with डा डी डे then convert it to ड. If word after removal of suffix भरताची word ends with ता ती, तु, ते then convert it to त
घर जंगल
घर जंगल
आप शहर
झाडा वर
झाड
भरता
भरत
ची
Table 7 Rule-based stemming output: sample example
Input sentence Sentence Tokeniza on Word Tokeni- सीता za on Stemming Suffix Removed
सीता
सीता रमेशची मोठी मुलगी आहे ती नािशकला पंचवटीत राहते सीता रमेशची मोठी मुलगी आहे
ती नािशकला पंचवटीत राहते
रमेशची
मोठी
मुलगी
आहे
ती
नािशकला
पंचवटीत
राहते
रमेश
मोठी
मुलगी
आहे
ती
नािशकला
पंचवटीत
राह
ला
त
ते
ची
4 Design and Development of Marathi Word Stemmer
47
Table 8 Result obtained for rule-based stemming system Name of system
Total words
Correct stemmed words
Incorrect stemmed words
Accuracy of the stemmer
Our stemmer
5000
3067
1933
61.34
stories and word dictionary, we have collected a total of 5000 words. From that total, 3067 words are correctly stemmed by our rule-based stemmer, and our stemmer is unable to stem 1933 words due to complex linguistic patterns of the words. The formula is given to calculate the accuracy. Accuracy =
Number of Correct Stem words ∗ 100 Number of Correct Stem words + Number of Incorrect Stem words
The accuracy of the stemmer is 61.34% (Table 8).
6 Conclusion Stemming plays an important role in the development of every NLP application. Marathi is a free-ordered language. There are different kinds of stemmers available for various other Deonagri script languages, like Hindi, Gujarati, Punjabi, and Konkani, but for Marathi, very little work has been done on stemming. It is still in its initiating stage. To achieve better accuracy in stemming Marathi natural language text, we have designed a rule-based approach for Marathi which is based on the Marathi language grammar rules. Our stemmer gives an accuracy of 61.34%. In the future, we have to concentrate on enhancing the database and developing stemming rule sets for different word types and patterns, because the data and rules cause to efficiently stem words and achieve more accuracy. Acknowledgements Authors would like to acknowledge and thank CSRI DST Major Project sanctioned No. SR/CSRI/71/2015 (G), Computational and Psycholinguistic Research Lab Facility supporting this work and Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra, India and also thankful to Chhatrapati Shahu Maharaj Research Training and Human Development Institute (SARTHI), Pune, for providing financial assistance for this Ph.D. research work. I would like to express my sincere thanks to research guide Dr. C. Namrata Mahender (Asst. Professor) of the Department of Computer Science and IT, Dr. B. A. M. U., Aurangabad, for providing research facilities, constant technical, and moral support.
48
P. Vaishali Kadam et al.
References 1. H.B. Patil, B.V. Pawar, A.S. Patil, A comprehensive analysis of stemmers available for Indic languages. Int. J. Nat. Lang. Comput. (IJNLC) 5 (2016) 2. Md. Redowan Mahmud, M. Afrin, Md. Abdur Razzaque, E. Miller, J. Iwashige, A rule based Bengali stemmer, in International Conference on Advances in Computing, Communications and Informatics (ICACCI) (IEEE, 2014). 978-1-4799-3080-7/14 3. I.R. Vijaya Lakshmi, S. Britto Ramesh Kumar, Literature review: stemming algorithms for Indian and non-Indian languages. Int. J. Adv. Res. Comput. Sci. Technol. (IJARCST) 2(3) (2014) 4. U. Mishra, C. Prakash, MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. (IJCSE) (2012) 5. https://www.researchgate.net/publication/233955548 6. D. Bijal, S. Sanket, Overview of stemming algorithms for Indian and non-Indian languages. (IJCSIT) Int. J. Comput. Sci. Inf. Technol. 5(2), 1144–1146 (2014) 7. A. Ramanathan, D.D. Rao, A Lightweight Stemmer for Hindi (KBCS CDAC Mumbai, 2019). http://www.kbcs.in/downloads/papers/StmmerHindi.pdf 8. Md. Shahidul Salim (Shakib), T. Ahmed, K.M. Azharul Hasan, A formal method for designing a Bangla Stemmer using rule based approach. Conference paper (2019). https://www.resear chgate.net/publication/337703586 9. S. Sarkar, S. Bandyopadhyay, Morpheme extraction task using Mulaadhaar—a rule-based stemmer for Bengali (2012). isical.ac.in/~fire/data/working-notes_2012/met/MET_JU.pdf 10. D. Ganguly, J. Leveling, G.J.F. Jones, Rule-based stemmers for Bengali and Hindi, in FIRE 2012 Workshop, Kolkata, India (2012) 11. T.T. Urmi, J.J. Jammy, S. Ismail, A corpus based unsupervised Bangla word stemming using N-Gram language model (2016). https://www.researchgate.net/publication/312430235 12. J. Sheth, B. Patel, A stemmer for morphological level analysis of Gujarati language, in International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), Ghaziabad, India (2014) 13. A.K. Barman, J. Sarmah, S.K. Sarma, Development of Assamese rule based stemmer using WordNet, in Proceedings of the 10th WordNet Conference, Wroclaw, Poland (2019) 14. J. Ameta, N. Joshi, I. Mathur, A lightweight stemmer for Gujarati (2012). ResearchGate publication/232416333 15. D. Kumar, P. Rana, Stemming of Punjabi words by using brute force technique. Int. J. Eng. Sci. Technol. (IJEST) 3 (2011). https://www.researchgate.net/publication/50406900 16. V. Gupta, G.S. Lehal, Punjabi language stemmer for nouns and proper names, in Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP, Chiang Mai, Thailand (2011), pp. 35–39 17. P. Kaur, P.K. Buttar, A rule-based stemmer for Punjabi verbs. Int. Res. J. Eng. Technol. (IRJET) 06(05) (2019) 18. N. Pise, V. Gupta, Rule based stemmer for Marathi language. Int. J. Comput. Sci. Eng. 6(5) (2018). E-ISSN: 2347-2693 19. P. Pandey, Int. J. Adv. Res. Comput. Commun. Eng. 5(10) (2016) 20. M.M. Majgaonker, T.J. Siddiqui, Discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. (IJCSE) 02, 2716–2720 (2010) 21. M.R. Valambe, Sugam Marathi Vyakaran va Lekhan, 54th edn. (Nitin Publication, 2019)
Chapter 5
Survey on Deep Learning System for Intruder Behavior Detection and Classification in Cloud Computing M. Mohan, V. Tamizhazhagan, and S. Balaji
1 Introduction As we all know it is an era of technology; every application we use and every other work we do utilize the Internet. The Internet is the medium through which transmission of data is done, due to which security of the Internet has become of utmost importance. There were several upgradations from the initial model of Internet in terms of security like HTTPS, etc. These were not enough as the inventions kept on developing and the dependency on Internet kept on increasing. The developers started to use new techniques to infiltrate the client or user systems to capture the bytes of data and utilize it to their needs. This makes the client to panic and deal with it. To outpass these, many numbers of proposals in enhancement of securities have been made, benefiting the clients to avoid being scared of illegal attackers. As the amount of dependency on Internet kept increasing, the volume of intruders trying to get benefited from the vulnerabilities of Internet is also increasing. To our surprise, the present-day world is filled with the hotspot devices filled with information which can be easily cracked, and information can be availed by the attacker. The devices which have become data points for the attackers are the ones connected to the Internet to utilize the features like cloud storages and other things where the information is stored. Thus, the world again traversed back to panic mode, as there is a more chance of being exploited now. To bypass these challenges, researchers started to work on security models which aid the providers to protect the information of clients from getting raided. In this, we discuss the cloud and its storage services and the work of researchers tried to enhance the security. In the end, M. Mohan (B) · V. Tamizhazhagan Department of Information Technology, Annamalai University, Chidambaram, Tamil Nadu, India S. Balaji Department of Computer Science and Engineering, Panimalar Engineering College, Chennai, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_5
49
50
M. Mohan et al.
we converge onto a point where we put forth the existing system work and decide on the work required for future based on the technology possessed by the intruders to break through the system. This helps us to better understand the attacks undergoing and required measures or actions to be performed. A cloud is a collection of data (either personal or private) stored in a server which can be accessed from any place on earth. This functionality of cloud brings vulnerability in securing the data and providing confidence of safety to the client. The vulnerability of the cloud is being exploited to the maximum by the intruders through several types of attacks. One of the attacks possibly occurred on cloud is cloud malware injection attacks which result in manipulating or stealing data or eavesdropping. The common malware injections are SQL injections and cross-site scripting attacks. DDoS is also another major attack possible on cloud. To counter these attacks, many detection systems were proposed with current-day trending and cutting-edge technologies like machine learning and deep learning. Before this advanced usage of technology in providing security for the cloud users from intruders, there were few initial basic protection theories which utilized encrypted disks to enable strong binding of remote users to their virtual machines. This was stated as secure management of virtual machines with strong user binding in semi-trusted clouds. To enhance our understanding on the securities, we educate clearly about the architecture and algorithms used to develop the stated or proposed systems to counter the intruders. These understandings help in visualizing each model uses and limitations and converging onto a statistical graph. These graphs are like a report of each and every major security proposal for enhancement of cloud security, and this paper results in the survey of such proposals making to decide the best and worst outcomes of each one of them.
1.1 Overview of Previous Proposed Works This paper [1] was the one which introduced a concept of strong user binding in semi-trusted clouds to build a strong relation between the provider and the client. In this, the type of service model which vulnerabilities got exposed is Infrastructureas-a-Service (IaaS). This paper concentrated its work on rescuing the client from trusting a fault provider and getting their data stolen. This paper states that in an Infrastructure-as-a-Service (IaaS) cloud, remote users will access the allotted virtual machines (VMs) through management server. This management server is directed by cloud operators. All the cloud operators cannot be trusted in the semi-trusted clouds because operators can execute arbitrary management commands to VMs of users and redirect their commands to malevolent or hostile VMs. The name of this latter attack is called “VM redirection attack.” The author averred that the root cause of the attacks is due to the weak binding between remote users and their VMs. In other terms, it is hard to enforce the execution of only users’ commands to their VMs. This paper proposes UVBond for a healthy binding of users with their VMs to resolve the above issue. UVBond initializes the client VM by decrypting its
5 Survey on Deep Learning System for Intruder Behavior …
51
Fig. 1 VM redirection attack
encrypted disk inside the trusted hypervisor. Later, it allocates a VM descriptor to securely identify that VM. To cover the semantic gap between high-level management commands and low-level hypercalls, UVBond utilizes hypercall automata, which is sequence acceptor of hypercalls given by commands. They implemented UVBond in Xen and created a new hypercall automaton for various management commands. Using UVBond, they validated that a VM descriptor, and hypercall automata will halt or block insider attacks and also that the overhead was not large enough in remote VM management. It also supports paravirtual disk drivers, which are important for efficient accessing of disk space, but it makes hard to handle this only in the hypervisor. For the secure implementation, UVBond duplicates I/O rings and grant pages in Xen. The implementation of UVBond is done in Xen 4.4.0 as per reference paper [2] (Fig. 1). To differentiate multiple management commands which were run in the same time, UVBond recognizes each process in the hypervisor and applies one hypercall automaton to each one the process. They also have designed hypercall automata for various types of management commands. This proposed UVBond also aid secure VM resumption and migration. According to the author-performed experiments, it was confirmed that cloud operators cannot execute management commands VM of the user or redirect user’s commands to a malicious VM. To add to the above set, it was also displayed that the overhead of using hypercall automata was negligible as mentioned above and the degradation of performance of the disk was maximum up to 9.5%. The author tried to contribute the following concepts when compared to previous systems or proposals: • The author spotted that untrusted cloud operators can execute management commands to arbitrary VMs due to the weak binding of users and their VMs. • This paper tried to resolve the weak binding issue by connecting users with their VMs through encrypted disks of the VMs resulting in strong binding of the user and respective VMs. • They utilized hypercall automata to fill the semantic gap between high-level management commands and low-level hypercalls. • They also tried to implement UVBond for paravirtual disk drivers in Xen, precisely Xen 4.4.0, and created various hypercall automata.
52
M. Mohan et al.
• The author also meant to confirm that cloud operators cannot execute management commands to arbitrary VMs with acceptable overhead through this paper. We can say that this paper is an extended version of a conference paper by the same author [3]. The major clarification provided in this paper compared to the previous paper is how the author attempted to create hypercall automata for various management commands through an experiment. They also attempted to show six complete hypercall automata and explained the relationship between hypercall sequences and command behavior in detail. They conducted several experiments to further investigate the execution overhead of management commands in UVBond. For making the hypercall automata serialization easier, they have designed an automaton convertor. The sections in this paper apart from the basic parts are implementation details and experimental results with UVBond. We discuss these two sections to better understand the success rate compared to the previous works and scope of contribution of this paper to the world of cloud computing security. A few graphs and flow diagrams from the paper will be utilized here as a reference for proper grasping of key concepts.
2 Implementation As mentioned above, we have implemented UVBond in Xen4.4.0 [2]. In this Xen system, the management server works in a privileged VM called Dom0 and the operators of cloud manage users VMs in this privileged VM namely Dom0. The AES and RSA algorithms which are used for encryption and decryption purposes are transported onto wolfSSL [4] to utilize them in the hypervisor. They have also designed a management client for UVBond using OpenSSL. The UVBond can be performed in other Type-I hypervisors, but it will be hard to be executed in Type-II hypervisors such as KVM [5]. In KVM, e.g., management commands are executed on top of the host operating system providing the hypervisor functions. To utilize the hypervisor functions, they need to gain the KVM device given by the host OS, instead of hypercalls. In excess to all these, we require to be in belief of entire OS, whose TCB is far greater than the hypervisor (Fig. 2). We know that to perform a successful encryption and decryption process as per algorithms such as AES and RSA, there will be a key generation process where a session key is generated; along with this, RSA public key will be obtained from a trusted key server or in terms of the digital certificate from management server. For executing a management command, the client needs to encrypt a pair of the stated descriptor and the hypercall automaton respective to the particular command using the session key. This makes transfer of the encrypted pair to the server possible. After the management server completes executing the command, it obtains the final result of transitions encrypted by the session key from the hypervisor and returns it back to the client. A table is developed converging all these data as inputs by stating the title of table as key, created by, stored in, and purpose.
5 Survey on Deep Learning System for Intruder Behavior …
53
Fig. 2 Xen’s split device model
3 Experimental Results They examined the system by booting VM with a virtual disk only when they utilized the respective corresponding disk encryption key. When process is done properly without any errors (i.e., by providing correct key to hypervisor), VM booted normally. During an error situation, VM could not read the boot loader due to improper decryption. This shows that malicious cloud operator could not boot users’ VMs. After a proper booting, they also examined the ability of UVBond to detect the execution of illegal commands by utilizing VM descriptors and hypercall automata. They used six management commands; they performed the execution of these commands in both legal and illegal manners. In either of the cases, they could detect the illegal execution of management commands correctly. For basic understanding, we included two performance graphs (Fig. 3) of hypercalls and management commands. Figure 3 displays the execution time of commands in UVBond and vanilla Xen. For both commands, overhead was same about 0.6 ms. This overhead was mainly due to the registration of a hypercall automaton, including
Fig. 3 CNN-MSVM cloud security system
54
M. Mohan et al.
encryption and encoding. The registered hypercall automaton was checked when a hypercall was issued, but execution time of a hypercall was almost not affected. To provide a solution to all those difficulties faced by the user and as well as the provider, the author proposed a security detail for cloud application using ML approaches. The main objective of this proposal is to bring the future generation of cloud security converged into a base using one of the ML algorithms which in this case is convolution neural networks. The reason to choose CNN is because it can facilitate the user and enhance the security provided by the cloud environment with automatic and responsive approaches. Rather than making ML to work completely on single feature of detecting and identification of sensitive data patterns, ML can provide solution which utilizes holistic algorithms for secure transmission of personal data throughout the cloud system. This paper proposes a system with ML algorithm for which experimental results are verified and performance evaluation is done by comparing with the previous or existing model. For better understanding of the proposed system, we mentioned a reference figure [6] from the original paper. Contributions of this proposed system to enhance the cloud security: 1. 2. 3.
Construction of a CNN model for analysis of network transfer data. Stereotyping of CNN model is differentiated and started to get used other than for training the model with data set. Experimentation and evaluation of the performance are done, and the comparison of it with existing system yielded better results.
The proposed intrusion detection model is tested on the most familiar and famous datasets: UNSW-NB15 datasets [7]. A comparison of their method with the previous methodologies and other machine learning technique systems for intrusion detection with accuracy can be seen in Table 1. The system is implemented to perform in real time. Thus, the implementation of this system is done in ensemble of ELMs in parallel. Each ELM is provided with set of features and training as well as the testing of them is done in parallel using a map-reduce method of implementation [8]. Finally, these results are aggregated by a softmax layer.
4 Development of Cloud Security As of now, we might have understood that there were several inventions and proposals that have been taken part in the cloud services, which not only benefits the cloud users but also the providers in ensuring the safety and security from intruders. The level of developments and new ideas do not have a pause and always speaks of future scope due to the nature of improvements with the part of attacks too. Every time a new security system gets used, intruder launches a new set of approach to bypass them resulting in constant improvement of securities for cloud. Until there is no chance of attack, the securities need to be under constant scrutiny to unveil any possible attacks.
5 Survey on Deep Learning System for Intruder Behavior …
55
Table 1 Comparison chart Title
Advantages
Disadvantages
Uses
Secure VM management with strong user binding in semi-trusted clouds
1. This proposed UVBond for providing strong user binding to their VMs 2. Encryption and decryption techniques are used
1. Could not provide a full-time solution for detection of intruder 2. Classification of intruders is not available
This proposal can be used for trusted binding between user and provider. It improves the trust of user on the cloud provider
A focus on future cloud: machine learning-based cloud security
1. Introduction of CNN model for analyzing the network traffic data 2. Provided better results in intrusion detection compared to previous models
1. Requires a large dataset 2. Requires intensive training
A new technology of machine learning is used in this which improved the standards of the proposal as intrusion detection is possible in this system
A multi-layer intrusion detection system with ExtraTrees feature selection, extreme learning machine ensemble, and softmax aggregation
1. Utilizes attack-based feature selection 2. 1 versus ALL SVM 3. ELM is used for multi-class classification
1. Requires extensive working and training 2. Classification can be done better
A high level of intrusion detection technique with capabilities to withhold the pressure from multiple attacks is possible. Each attack is defined individually and separately
Hypervisor-based cloud intrusion detection through online multivariate statistical change tracking aggregation
1. It counters traditional IDS 2. Utilizes HyIDS, deployed at hypervisor level 3. Can monitor and analyze network communications between VMs
1. Has a future scope and requires improvement 2. Refinement is required for better results
It is a great proposed technique with a humongous scope to be the revolutionary one, but requires an extensive work and refinement
Now as the various levels and types of security proposals for cloud are discussed above, let us segregate each with its pros, cons, and uses which provide us with a better idea to which concept and novel system to utilize for the type of system. The reason for the discussion of utilization is because every cloud provider has a unique way of providing and also faces a diverse type of attacks from intruders. According to the level of difficulties and requirement of securities, the provider can adopt the technology and adapt as per the working and desired results. Below is a tabular column with all the information such as name of the proposal, pros, cons, and uses. This table can act as a timeline or level of defenses for a cloud system and also can be referral or survey for the provider and user in understanding the cloud intrusion detection and classification theory.
56
M. Mohan et al.
5 Conclusion We have witnessed the evolution in the levels of security and rapid change in usage of technologies as per the attacks. This makes us to come to a point where we figure out the best suitable approach for each attack as per optimal results. Now the provider can decide on the security feature involved in the cloud service as per the consumer requirement. The user can also feel safe and reliable on the cloud service procured. This results in mutual benefaction of user and provider leading to a sustainable environment in cloud service. To conclude, we put on the contents of knowledge we tried to gain through this survey. Primarily, we understood the essence of each key proposed systems. Later, we disintegrated each proposal to take in every key element utilized and also about advantages of the system. These salient points led us to derive a tabular chart and characterize each system and its uses. The chart aided us to fulfill the purpose of survey, i.e., deciding on which system suits best for the purpose of trouble faced by the user or provider. Finally, we state that the work in designing detection of intruders and other securities for cloud services is a progressive thing which adapts as per the knowledge possessed by intruders and works to its efficiency to make the clients safe and secure about their information.
References 1. K. Inokuchi, K. Kourai, Secure VM management with strong user binding in semi-trusted cloud. J. Cloud Comput. Adv. Syst. Appl. (2020). https://doi.org/10.1186/s13677-020-0152-9 2. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfield, Xen and the art of virtualization, in Proceedings of Symposium on Operating Systems Principles (2003), pp. 164–177. https://doi.org/10.1145/1165389.945462 3. K. Inokuchi, K. Kourai, UVBond: strong user binding to VMs for secure remote management in semi-trusted clouds, in Proceedings of IEEE/ACM International Conference on Utility and Cloud Computing (2018), pp. 213–222. https://doi.org/10.1109/UCC.2018.00030 4. WolfSSL Inc. wolfSSL Embedded SL/TLSLibrary. https://www.wolfssl.com/. Accessed 27 Apr 2019 5. M. Li, W. Zang, K. Bai, M. Yu, P. Liu, MyCloud: supporting user-configured privacy protection in cloud computing, in Proceedings of the 29th Annual Computer Security Applications Conference (2013), pp. 59–68. https://doi.org/10.1145/2523649.2523680 6. S. Singh, Y. Jeong, J.H. Park, A survey on cloud computing security: issues, threats, and solutions. J. Netw. Comput. Appl. 75, 200–222 (2016) 7. D. Zissis, D. Lekkas, Addressing cloud computing security issues. Futur. Gener. Comput. Syst. 28(3), 583–592 (2012) 8. N. Moustafa, J. Slay, UNSW-Nb15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set), in 2015 Military Communications and Information Systems Conference (MilCIS)
Chapter 6
Abstractive Multi-document Summarization Using Deep Learning Approaches Murkute Poornima, Venkateswara Rao Pulipati , and T. Sunil Kumar
1 Introduction Text summarization is process of prune a long sentence of the text into smaller summary; it contains very important information that is present in the original text. To get the betterment text summarization methods, namely extractive summarization method and abstractive summarization method, can be used. Extractive method of summarization extracts important phrases from original documents to produce a summary by grouping them without changing original text, and it is proposed based on part of speech (POS) tagging. The models used for extractive summarization are TextRank algorithm [9], word vector embedding, combination fuzzy logic and [9] restricted Boltzmann machine, Markov chain model and [12] multimodel recurrent neural network. The TextRank model is only used for the extractive summarization. Document summarization main goal is to decrease the redundancy and generate the good quality of summary. Some exiting models of summarizations have more redundant data and give the highest ranking without checking redundant data. Sentence compression is used for reduce the redundant data. The process of extractive summarization and example of extractive summarization are as shown in Fig. 1. Abstractive summarization approach creates the summary in such a way that human do generate new novel words that are not present in source text. This approach can be used for understanding source text and examines the data that uses linguistic M. Poornima (B) · V. R. Pulipati · T. Sunil Kumar VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India V. R. Pulipati e-mail: [email protected] T. Sunil Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_6
57
58
M. Poornima et al.
Fig. 1 Example of extractive summarization
Fig. 2 Example of abstractive summarization
methods. The process of abstractive summarization and its example is as shown in Fig. 2. Different deep learning models are applied for the abstractive text summarization including the Seq-to-Seq model and Bi-LSTM model.
1.1 Sequence-to-Sequence Model Seq-to-Seq [5] model solves any problem involving sequential data. The main aim of the proposed work is to generate a text summarizer that takes a long list of words from the text data as input and outputs a summary which is a sequence as well. A model can be implemented using Many-to-Many Seq-to-Seq model instead of only Seq-to-Seq model. Encoder–decoder model is typically used for address Seq-to-Seq obstacles [11] with I/O sequences of varying in range.
1.2 Encoder–Decoder Model Encoder–decoder architecture consists of two components are encoder and decoder. Before pass the text into encoder, text must have converted to the vector. Encoder reads the i/p sequence at every timestep and encapsulates the information. It produces the output at end of the string that is called context vector (CV). Output generated by the encoder is removed, and only context vector is passed to the decoder. In decoder,
6 Abstractive Multi-document Summarization …
59
Fig. 3 Encoder–decoder architecture
the output of the one state is given as input to the other state. It takes the CV as the input and creates the output depends on the provided CV (Fig. 3).
1.3 LSTM (Long Short-Term Memory) Model Long short-term memory model is a kind of recurrent neural network which is used to make the classification and prediction on the time series data. LSTM model also called as sequence-to-sequence data preprocessing model. It is the chain structure architecture, and it consists of four gates like forget gate, input gate, memory gate and output gate (Fig. 4).
Fig. 4 LSTM architecture
60
M. Poornima et al.
Forget gate-The objective of forget gate is to decide which information is not required that is discarded from the cell state. Input gate- Input gate decides which new information is store in the cell state. Memory gate- Influence of remembered information on new information is controlled by memory gate. Output gate- Output gate regulates the amount of new information that is forwarded to next LSTM layer. Forget gate applies the sigmoid function to the current input states and previous hidden states. Sigmoid function value is determined the previous state information is discarded or remembered. If sigmoid value is 1, the content is remembered and if value is 0, the information is forgotten. Input gate multiplies two functions that are sigmoid and tanh functions. Sigmoid function is used for regulating the information and decides which value has to update. tanh function is used to create the vector for new information. In the output gate, sigmoid function considers the i/p vector, prior hidden states, current content and bias.
1.4 Attention Mechanism The attention mechanism principles are mostly utilized to determine how much attention we must pay to each word in the input sequence orderly to generate word at every timestep. Let us consider a scenario as follows. Input sequence: ‘which ice-cream do you like?’ Output sequence: ‘I love chocolate ice-cream’ The output sequence’s first word, ‘I,’ is associated with the input sequence’s fourth word, ‘you’. Similarly, the output sequence’s word ‘love’ is connected to the input sequence’s word ‘like’. There are some drawbacks in encoder–decoder model that can be achieved by using the attention mechanism.
1.5 Pointer-Generated Network and Coverage Mechanism Pointer-Generated Network (PGN) to allow the model to copy the words from source text and generate the words from vocabulary. It reduces the both problems that are one word is replaced with other and incorrect factual information. Coverage mechanism is use to keep the track on covered words so far and penalize network for same word again.
6 Abstractive Multi-document Summarization …
61
2 Background of the Work Su et al. [1] tell that text segmentation and a two-stage transformer-based summarization module are two main methods for abstractive text summarization that have been described. To classify i/p text into segments, the text segmentation module uses a Bi-LSTM and a pre-trained bidirectional encoder representations from transformers (BERT). Ghodratnama et al. [2] come up with unique intelligent approach is ExDoS; it is a combination of both supervised and unsupervised algorithms into a single frame. Combing both algorithms into single framework is very difficult task. Bhagchandani et al. [3] used three approaches for abstractive text summarization; they are clustering, word graphs and RNN. Clustering is divided the multiple documents of the data into clusters based on the context and important analysis. Word graph is reducing the text. Bidirectional LSTM is the part of RNN. It is used for the sentence compression. Fuad et al. [4] the used complementary model for two different task that are sentence clustering and neural sentence fusion. For sentence, clustering uses the word embedding and neural network architecture. Encoded sentence used the bidirectional Gated Recurrent Unit (GRU). Neural sentence fusion used the transformer model. Mohammad Masum et al. [5] used algorithms are bidirectional RNN with LSTM and attention layers. A bidirectional LSTM layer is present at encoder side, and the attention layers are in decoder side. To produce best abstractive text summarization used steps like counting the glossary, count word missing, count word embedding. Main aim of the work is to improve the efficiency and minimize the train loss function. Barros et al. [6] NATSUM is creating a narrative chronologically arranged summary in the document. Enriched timeline extraction module (ETEM) and abstractive summarization module (ASM) are used to accomplish this. To do so, this technology first generates a cross-document timeline. Second, one sentence is generated for each event using natural language generation techniques, considering the arguments involved in the event. Choi et al. [7] have used ACL dataset, and both strategies, namely document content memory (DCM) and LSTM, have been implemented in abstractive summarization. The first is a language model based on LSTM for recognizing sequential patterns in summaries, and the second is a neural attention model with DCM used for sequential pattern recognition. When training a language model, it is used to appropriately represent the contents of the source document. The keywords from the original document are taken into account using DCM. Rezaei et al. [8] in this paper used two models that are autoencoder and deep belief network (DBN). Autoencoder gives good results compared to the deep belief network. The used dataset is UGC 2007 which consists of 45 subject domains and 25 documents for each domain. The generated summary size is 250 words. Suleiman and Awajan [9] restricted Boltzmann machine (RBM), variation autoencoder (VAE), convolutional neural network (CNN) and recurrent neural network (RNN) were used for extractive summarization. The SummaRuNNer technique is based on GRU RNN and was used to produce the results. DUC2002 is the dataset that was used. See [10] the traditional Seq-to-Seq model was proposed in two orthogonal methods in a novel architecture. The original document words are copied via pointing by the
62
M. Poornima et al.
PGN. Coverage mechanism is used to keep track of what has been summarized and resulting in less duplication of the words. The CNN-Daily Mail dataset is used for text summarization.
3 Proposed Methodology Bi-LSTM Encoder–Decoder with attention mechanism Encoder calculates the fixed-length of vector from the entire i/p sequence of words; decoder guesses the output sequence based on the complete sequence of words that is only conceivable in the case of short sentences but for lengthier sentences, this does not work well. Hence, attention mechanism [9] is used for longer sentences that are exigent for the encoder to remember into a fixed-length vector. Encoder builds by adding stacking of three LSTM layers. The entire input sequence is read by an LSTM encoder and provides one word at each time step to the encoder by processing the information at each time step, the input sequence of whole information is captured. The LSTM is a part of decoder network that analyzes the whole target sequence word for word and predicts a sequence one time step off. It is taught to anticipate the next word depending upon the previous word. The model is then checked on new source sequences with unknown targets after the training phase. To decode a test series, configuring the inference architecture is required. To decode the text sequence, encode the full sequence of words and pass the encoder’s final state to the decoder as input. With the internal states, at each timestep run the decoder, the probability of next word will be the output. After produced of the next word, with current timestep we need to upgrade internal states and pass the selected word as an i/p to the decoder in the next time step, repeat this process until an token or the target sequence reaches its maximum length (Fig. 5). ht CV (context vector) is calculated by multiplying the output of the encoder that is hi and attentions weights ai and makes the summation. The CV is giving as an input to the decoder. aij is calculated by the single attention distribution (AD) divided by the total number of ADs. To generate the AD can apply the Softmax function over the decoder previous states Si- 1 and output of the encoder. Pointer-Generated Network and convergence mechanism PGN [10] is containing two parts: One is pointing words, and other one is generating words. Pointing is used to copying the words from source text and generates the words from fixed thesaurus. In PGN, final distribution is generated by summation of Attention distribution and vocabulary distribution. The CV is fixed length vector representation. Vocabulary distribution produced by concatenation of CV that is ht with decoder states st and fed through two another layers. Probability distribution pv is calculated by applying Softmax function to CV ht and decoder states st . The final distribution predicts from the pt . Where v, v , b, b are learnable parameters, this are set during the training procedure. The probability distribution pv of the words is
6 Abstractive Multi-document Summarization …
63
Fig. 5 Encoder–decoder with attention mechanism
present in the vocabulary p (w). Probability generation pgen is evaluate from the CV ht, decoder states st and decoder i/p xt, where wh, ws, wx are learnable parameters and σ is sigmoid function. To generate final distribution, p (w) used the pgen and pv (w). The out-off vocabulary (OOV) is main advantage of PGN. If ‘w’ is an OOV word, then pv (w) is 0, and if ‘w’ is not present in source document, then at is 0. CM uses the AD to keep the track on covered words so far and penalize network for repeating same word again. CM contains the coverage vector. The coverage vector ct is the summation of all previous decoder time steps’ ADs. ct identifies the degree of coverage that words are received from the AD (Fig. 6). The proposed model is built to produce the abstractive summarization of the multiple documents for which it utilizes Bi-LSTM, CM and PGN model [10]. Multiple documents are given as the input where the preprocessing is done in the sequence to avoid the messy and uncleaned data that potentially gives the disastrous results; text in the documents will be converted into the vectors with the help of encoder–decoder mechanism and then are applied with Bi-LSTM [3] for document classification, Seq-to-Seq attention model is used in two orthogonal ways (PGN, via pointing words are copied from the input document and the CM to keep track of what
64
M. Poornima et al.
Fig. 6 Pointer Generator Network (PGN) model
Fig. 7 Workflow
has been summarized) to deal with OOV words and predict the redundant words in the summary. Thus, finally it produces the abstractive summary. The work flow is shown in Fig. 7, and the algorithm of the proposed model is shown below. Step 1: Start Step 2: calculate Coverage vector ht [ht =
Tx
αij hi ]
j=1 exp(a )
ij // attention weights Step 3: αij = Tx exp(a ik ) k=1 Step 4: aij = f(si−1 , hi ) // Attention distribution to st , ht Step 5: get pvapply the Softmax function pv = softmax v (v[st , ht ] + b) + b // Applied softmax function Step 6: P (w) = pv (w) Step 7: calculate pgen by using ht ,st ,xt pgen = σ wTh ht + wTs st + wTx xt + bptr // Probability generation. Step 8: get P (w) by using pgen , pv (w) t p(w) = pgen pv (w) + 1 − pgen i:wi=w ai // Final distribution t Step 9: cti = t−1 t a // coverage vector
6 Abstractive Multi-document Summarization …
Step 10: Closst =
i
65
min(ati , cti ) // coverage loss
Step 11: End Dataset Description The dataset ‘Amazon Fine Food Reviews’ has been taken from Kaggle website which has 5, 68,454 records (no. of reviews), 2, 56,059 number of users and 74,258 number of products. Representation of review is like worst reviews-1 and bad reviews-2, called negative reviews. Average reviews-3 called neutral reviews. Good reviews4 and very good reviews 5, called positive reviews. Product ID, user ID, profile name, helpfulness, score, time, summary, text, helpfulness numerator and helpfulness denominator are the attributes of the dataset.
4 Experimental Evaluation Upon experimentation, the results are as shown below. Approximately 500,000 reviews from the dataset are considered. The below histogram shows the distribution length of text to be analyzed by the length of summary and input text. The entire dataset in split into train and test sets; x and y are I/O labels. The total input vocabulary (x_voc) count is 31152, and output vocabulary (y_voc) count is 33,414. Performance is measured using some parameters like Loss function, Validation loss (Val-loss), Accuracy and Validation accuracy (Val-accu). The single epoch (iteration) batch size is 512. When the Loss function minimizes, the Val-loss will increases. The epochs are generated until the Loss function gets minimize or Val-loss maximize. Once the Val-loss increases, the training can stop. If training stopped at 25 iterations, it would be earlier stop. The validation is done based on the parameters rmsprop optimizer, Sparse-Categorical-Cross-entropy loss function and accuracy (Fig. 8). In Table 1 shows the 60 epoch values of Loss, Accuracy, Val_loss and Val_accu parameters. Single epoch values are shown in Fig. 9. Fig. 8 Performance graph
66
M. Poornima et al.
Table 1 Experimental evaluation Summary in data set
Generated summary by proposed model
ROUGE1 Recall
ROUGE-1 Precision
ROUGE1 F1-Score
The sun was setting on a The quick brown fox sunny day in May. John jumped over the lazy laid in the garden when dog a quick brown fox jumped over him. Ricky is our pet
0.44
0.43
0.44
The coffee tasted great and was at such a good price! I highly recommend this to everyone!
0.17831
0.15445
0.16482
This is the worst cheese omg gross gross that I have ever bought! I will never buy it again and I hope you won’t either!
0.04936
0.04257
0.04550
love individual oatmeal cups found years ago sam quit selling sound big lots quit selling found target expensive buy individually trilled get entire case time go anywhere need water microwave spoon know quaker flavor packets
0.27
1
0.43
Great coffee
Love it
Fig. 9 Summarization of more than one epoch
6 Abstractive Multi-document Summarization … Table 2 Comparison table
Model
67 ROUGH-1
ROUGH-2
ROUGH-L
Seq2Seq
34.9
22.6
32.6
Transformer
35.7
23.4
33.1
Pointer-Generated
38.5
23.7
34.7
ROUGH is an evaluation metric for abstractive summarization. ROUGE (RecallOriented Understudy Gisting Evaluation) is a recall-based evaluation metric. Recall measures how much of the reference summary is captured within the generated summary (Table 2). Recall = Number of overlapping words/Total number of words in reference summary Precision = No of overlapping words/Total number of words in generated summary.
5 Conclusion and Future Enhancements A general-purpose abstractive method has been utilized for document summarization. Abstractive text summarization has been implemented using bidirectional LSTM with attention mechanism and pointer generator network. With the help of bidirectional LSTM preprocess, the content and extraction of the content is done from the multiple documents that are preprocessed. The validation loss increases stop the training model automatically. The results are showing in the epoch (iteration) with the batch size of 512. If the epoch size increases, then accuracy also increases. The accuracy measured in ROUGH metrics. The BERT Pretrained models are Referred for the future enhancement.
References 1. M.-H. Su, C.-H. Wu, H.-T. Cheng, A two-stage transformer based approach for variable-length abstractive summarization. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 2061–2072 (2020) 2. S. Ghodratnama et al., Extractive document summarization based on dynamic feature space mapping. IEEE Access 8, 139084–139095 (2020) 3. G. Bhagchandani, et al., A hybrid solution to abstractive multi-document summarization using supervised and unsupervised learning, in 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (IEEE, 2019) 4. T.A. Fuad et al., Neural sentence fusion for diversity driven abstractive multi-document summarization. Comput. Speech Lang. 58, 216–230 (2019) 5. A.K. Mohammad Masum, S. Abujar, M.A. Islam Talukder, A.K.M.S. Azad Rabby, S.A. Hossain, Abstractive method of text summarization with sequence to sequence RNNs, in 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (2019), pp. 1–5. https://doi.org/10.1109/ICCCNT45670.2019.8944620
68
M. Poornima et al.
6. C. Barros et al., NATSUM: narrative abstractive summarization through cross-document timeline generation. Inf. Process. Manag. 56(5), 1775–1793 (2019) 7. Y.S. Choi, D. Kim, J.-H. Lee, Abstractive summarization by neural attention model with document content memory, in Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, 2018 8. A. Rezaei, S. Dami, P. Daneshjoo, Multi-document extractive text summarization via deep learning approach, in 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI) 9. D. Suleiman, A.A. Awajan, Deep learning based extractive text summarization: approaches, datasets and evaluation measures, in 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS) (IEEE, 2019) 10. A. See, P.J. Liu, C.D. Manning, Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017) 11. R. Nallapati, et al., Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016) 12. J. Chen, H. Zhuge, Extractive text-image summarization using multi-modal RNN, in 2018 14th International Conference on Semantics, Knowledge and Grids (SKG), 2018, pp. 245–248. https://doi.org/10.1109/SKG.2018.00033
Chapter 7
Heart Disease Prediction Using Decision Tree and SVM R. Vijaya Saraswathi, Kovid Gajavelly, A. Kousar Nikath, R. Vasavi, and Rakshith Reddy Anumasula
1 Introduction It is a very tedious task to diagnose cardiac disease because there are many attributes; it depends on such as glucose levels, blood pressure levels, cholesterol levels including abnormality in pulse rate and other attributes. Data processing and analysis has many medical practitioners have provided reliable and productive solutions for many Applications such as deep portraits of patients, computer-aided Liver Cancer software for diagnosis and medical image segmentation Diagnosis and the Detection of Lungs. In recent years, there have been versions of the neural network that has shown outstanding data efficiency. Forecast algorithms of deep learning have discovered an exceptional performance for data prediction. Deep learning algorithms have shown an important role in the medical domain for classification of diseases and discovery of knowledge like brain disease detection, glaucoma detection, diabetic detection and heart disease detection using the collected biomedical data. The essence of this heart disease is very complex, and caution should also be taken to treat the disease. Not being cautious can cause the heart to degrade or lead to death. The view of data mining and medical science is widely used for finding different types of metabolic syndromes. Classification of data mining plays an important role in the heart disease prediction and data investigation. R. Vijaya Saraswathi · K. Gajavelly (B) · A. Kousar Nikath · R. Vasavi · R. Reddy Anumasula Department of Computer Science and Engineering, VNR VJIET, Hyderabad, India R. Vijaya Saraswathi e-mail: [email protected] A. Kousar Nikath e-mail: [email protected] R. Vasavi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_7
69
70
R. V. Saraswathi et al.
2 Related Work There is a lot of work directly related to the fields in the paper. In the medical sector, applying machine learning and artificial intelligence to generate strong predictions of accuracy [1, 2] to predict heart disease, probability-based algorithm naïve Bayes algorithm shows a good accuracy of predicting heart disease [3]. Tree-based algorithm decision tree is classifying the ECG data with a good accuracy [4, 5]. Random forest algorithms perform better on large amounts on heart ECG data [6]. The backpropagation MLP in artificial neural network and machine learning are also used. The findings obtained from the model are compared to the results of the current models [7]. Patient data or documents are gathered from the UCI laboratory and used with Naïve Bayes, support vector machine, decision trees and neural network to find different patterns. The obtained results are then compared with these algorithms for performance and accuracy checking. In competition with the other current approaches, the proposed hybrid approach results in an F-score of 86.8% [8]. Then classification is applied without CNN segmentation. In this phase, the heart cycles were considered with separate starting locations from the electrocardiography signals in the early stage. CNN is able to produce attributes of different locations in the patient’s testing phase [9, 10]. Huge amount of health industry data generated is not being used. The new methods discussed here increase the accuracy of heart disease prediction and reduce the cost of computing in the simplest and most efficient way. Several techniques used in this work are highly precise, effective and accurate [11, 12]. The Electronic Health Record data is analyzed and classified for the prediction of cardiac disease by traditional AI classification techniques such as decision tree, hybrid random forest and support vector machine (SVM) [13, 14]. Chronic heart disease data has been collected and analyzed for prediction using XGBoost algorithm [15]. The working of XGBoost algorithm and cache access patterns, tree boosting and compression and sharing is explained in the paper [16]. The cardiac data is analyzed and classified for the prediction of cardiac disease by multilayer perceptron neural network (MLPNN) with back-propagation (BP) used as the training algorithm [17].
3 Proposed System The previous existing model which uses logistic regression as the classifier is less accurate because data is not linearly separable. We are proposing a new system which uses decision trees and support vector machine to build a classifier. Using decision tree, we split the data into two/more parts by picking an attribute such that entropy is less and recursively split the subparts. Using support vector machine, we map the low-dimensional nonlinear separable data to high-level data which can be separable by a line or plane or hyperplane.
7 Heart Disease Prediction Using Decision Tree and SVM
71
Fig. 1 Machine learning methodology
3.1 Methodology We followed a general machine learning procedure. As mentioned in Fig. 1, first, we collected trusted ECG data from online data repository called Kaggle. Then preprocessed the data and selected attributes that will influence the target and removed outliers from data and split the data into test set and train set, and train set is used for training algorithms and then test the models with test set.
3.1.1
Data Collection and Processing
Data Collection The dataset collected from [18] composed of 1025 patient records with 14 attributes like restagc, fbs, thal, ca, cp, sex, age, thalach, chol, trestbps, slope, oldpeak, exang and target. Below is the table containing the description of attributes used for analysis. Figure 2 shows that age does not play a major role in predicting heart diseases because of equal number of people of same age groups are present with and without heart diseases. Dataset Description Table 1 describes the attributes used for data and the description of attributes. Data Preprocessing and Outlier Detection For every attribute or feature, we find the Z-score of every single value of that attribute relative to the column mean and standard deviation. Then takes the magnitude or
72
R. V. Saraswathi et al.
Fig. 2 Box plot of sex versus target Table 1 Attribute description Attribute
Description
Restagc
Resting electrocardiographic results (0, 1, 2)
Fbs
Blood sugar levels on fasting > 120 mg/dl representing 1 otherwise 0
Thal
Normal = 0; fixed defect = 1; reversible defect = 2
Ca
Fluoroscopy colored major vessels numbered from 0 to 3
Age
Age of patient in completed years
Sex
Male = 1 and female = 0
Cp
Chest pain type from 0 to 3 ranging the variety of difficulty
Trestbps
Resting blood pressure in mm Hg
Chol
Serum cholesterol in mg/dl
Thalach
Highest heart beats per second achieved
Exang
Whether exercise decreased angina (yes = 1 and no = 0)
Oldpeak
ST depression that is induced due to exercise which is relative to rest
Slope
The slope of the ST segment
Target
Disease = 1, no disease = 0
7 Heart Disease Prediction Using Decision Tree and SVM
73
Fig. 3 Pearson’s correlation on features versus target
absolute value of the obtained z score. If the z score is less than a certain threshold, particular row or record is outlier and is removed. Data Transformation: (Standardization) It is a technique used to scale where the values are scaled around the mean with standard deviation as unity. X =
X −μ σ
(1)
Attribute Selection (Pearson Correlation) This measures the relationship between two sets of data. The value of the correlation varies from −1 to 1. Here +1 denotes a strong positive correlation, a strong negative correlation is denoted by −1, and no correlation is zero. If the absolute value of the correlation is less than 0.2, then it means the two datasets are not correlated and can omit during analysis. If an attribute having correlation closer to zero, then that attribute can be eliminated. Figure 3 is the graph showing importance of each feature and is clear that ‘thal’ is more important feature and ‘age’ is less important feature.
3.1.2
Machine Learning Algorithms
The proposed method employs the following machine learning algorithms [19, 20] to train the data. (A)
Naïve Bayes;
74
(B) (C) (D)
R. V. Saraswathi et al.
Decision trees; Random forest; Support vector machine.
Naïve Bayes It is derived from the mathematical Bayes theorem. It is a sorting technique used. This classifier is based on the presence of an attribute in a class that is not linked to any other attribute being present. Decision Trees Decision tree is a machine learning algorithm used as both regression technique and classification technique. It is a tree-structured classifier. As shown in Fig. 4, it constructs a tree of decision nodes where the internal nodes represent the attributes of data and edges denote the decision rules and each leaf node denote the class label, i.e., target class label if that path is chosen. In a decision tree, the prediction of the class will be as follows: The algorithm starts with the root node of the decision tree based on the decision rules; further nodes are selected, and it continues till the lead node is reached; the class label at leaf node is the class label predicted. Information gain for attribute selection measures: (1) (2)
Information Gain = Entropy(S) − [(Weighted Avg) * Entropy (each feature) Entropy(s) = −P(yes)log(P(yes)) − P(no) log(P(no)) Entropy = −
m j=1
Fig. 4 Decision tree model
pi j log2 pi j
(2)
7 Heart Disease Prediction Using Decision Tree and SVM
75
Fig. 5 Random forest tree model
Random Forest Random forest is a ML algorithm [21]. It is based on the concept of ensemble learning, which is a technique of uniting several classifiers to solve a large problem and to improve the performance of the model. As shown in Fig. 5, training data is divided into n subsets, and each subset is trained with a decision tree to design n models, prediction is done based on voting system, i.e., majority of classes is considered as final predicted class. Linear Support Vector Machine It creates the decision boundary or best line that can separate n-d space into classes; as a result, it becomes easy to predict in which class a new record belongs to. This line is called a hyperplane. Linear support vector machine chooses the extreme points to create the best line. These extreme points are called as support vectors [22]. Figure 6 shows how the decision boundary is drawn to classify the data. g(x) = w T + b Minimize: 1 T ai di w T xi + b + ai w w− 2 i=1 i=1 N
J (w, b, a) = ⇒ Q(a) =
N i=1
1 ai a j di d j xiT x j 2 i=1 j=1 N
ai −
N
N
W —Weight matrix, J—Cost function, X—Input variable; a, b, d—Constants for a dimension;
76
R. V. Saraswathi et al.
Fig. 6 Decision boundary using SVM
g(x)—Final SVM model; J(w, b, a)—Cost function for a given weight in an assumed dimension. First the algorithm starts with same dimension and finds the optimized weight for the parameters and finds the cost using those weights; then, it maps data to a high-dimensional data and finds the optimized weight for the parameters and finds the cost using those weights; the process repeats until a threshold cost is reached, and the dimension with least cost is taken as the final dimension, and weights of those dimension are the final weight matrix.
4 Results and Discussion As shown in Fig. 7, confusion matrix is a metric of algorithm evaluation. Fig. 7 Confusion matrix model
7 Heart Disease Prediction Using Decision Tree and SVM
77
Table 2 Confusion matrix of each algorithm Algo
TP
TN
FP
FT
Acc
DT
335
40
8
392
89.6
GNB
286
89
44
356
85.6
SVM
300
75
31
369
87.6
RF
332
43
23
377
88.1
Acc—Accuracy (%); DT—Decision tree; TP—True positives; SVM—Linear support vector machine; RF—Random forest; FP—False positives; GNB—Gaussian Naïve Bayes; TN—True negatives; FN—False negatives. Table 2 shows the confusion matrix of each algorithm and finds accuracy with the confusion matrix. Decision tree and random forest are having high accuracy since the data used is such a way that it can be classified well using the entropy values of features. Our model is built fast and predict result faster and gives the same accuracy compared to other complex algorithms.
5 Conclusion Processing and identification of health data of heart will help in prediction of abnormal conditions in heart and rescuing human lives as fast as possible. ML techniques have been used to process relevant data to provide a new approach to heart disease. In the medical world, the prevention of heart disease is complicated and significant. However, once the condition is diagnosed in the early stages and prevention can be achieved as soon as possible, the lethal rate can be managed.
References 1. C. Science, G.M. Faculty, Heart disease prediction using machine learning and data mining technique. IJCSC 0973-7391 7, 1–9 (2009) 2. L. Baccour, Amended fused TOPSIS-VIKOR for classification (ATOVIC) applied to some UCI data sets. Expert Syst. Appl. 99, 115–125 (2018). https://doi.org/10.1016/j.eswa.2018.01.025 3. J. Brownlee, Naive Bayes for Machine Learning (2016). Retrieved 4 Mar 2019 4. H. Sharma, S. Kumar, A survey on decision tree algorithms of classification in data mining. Int. J. Sci. Res. (IJSR). 5 (2016) 5. L. Rokach, O. Maimon, Decision Trees (2005). https://doi.org/10.1007/0-387-25465-X_9 6. G. Biau, Analysis of a random forests model. J. Mach. Learn. Res. 13 (2010)
78
R. V. Saraswathi et al.
7. R. Das, I. Turkoglu, A. Sengur, Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36(4), 7675–7680 (2009). https://doi.org/10.1016/j.eswa.2008. 09.013 8. C.-A. Cheng, H.-W. Chiu, An artificial neural network model for the evaluation of carotid artery stenting prognosis using a national-wide database, in Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), July 2017, pp. 2566–2569 9. J. Nahar, T. Imam, K.S. Tickle, Y.-P.P. Chen, Association rule mining to detect factors which contribute to heart disease in males and females. Expert Syst. Appl. 40(4), 1086–1093 (2013). https://doi.org/10.1016/j.eswa.2012.08.028 10. S. Zaman, R. Toufiq, Codon based back propagation neural network approach to classify hypertension gene sequences, in Proceedings of the International Conference on Electrical, Computer and Communication Engineering (ECCE), Feb 2017, pp. 443–446 11. D.K. Ravish, K.J. Shanthi, N.R. Shenoy, S. Nisargh, Heart function monitoring, prediction and prevention of heart attacks: using artificial neural networks, in Proceedings of International conference on Contemporary Computing and Informatics (IC3I), Nov 2014, pp. 1–6 12. W. Zhang, J. Han, Towards heart sound classification without segmentation using convolutional neural network, in Proceedings of the Computing in Cardiology (CinC), vol. 44, Sept 2017, pp. 1–4 13. C. Sowmiya, P. Sumitra, Analytical study of heart disease diagnosis using classification techniques, in Proceedings of the 2017 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), Tamilnadu, India, 23–25 March 2017, pp. 1–5 14. S. Mohan, C. Thirumalai, G. Srivastava, Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7, 81542–81554 (2019) 15. S. Nalluri, R. Vijaya Saraswathi, S. Ramasubbareddy, K. Govinda, E. Swetha, Chronic heart disease prediction using data mining techniques, in Data Engineering and Communication Technology, ed. by K. Raju, R. Senkerik, S. Lanka, V. Rajagopal. Advances in Intelligent Systems and Computing, vol. 1079 (Springer, Singapore, 2020). https://doi.org/10.1007/978981-15-1097-7_76 16. T. Chen, C. Guestrin, Xgboost: a scalable tree boosting system, in Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (2016) 17. P. Singh, S. Singh, G. Pandi Jain, Effective heart disease prediction system using data mining techniques. Int. J. Nanomed. 13, 121–124 (2018). https://doi.org/10.2147/IJN.S124998 18. https://www.kaggle.com/ronitf/heart-disease-uci 19. V.S. Manvith, R.V. Saraswathi, R. Vasavi, A performance comparison of machine learning approaches on intrusion detection dataset, in 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 2021, pp. 782–788. https://doi.org/10.1109/ICICV50876.2021.9388502 20. R.V. Saraswathi, V. Bitla, P. Radhika, T.N. Kumar, Leaf disease detection and remedy suggestion using convolutional neural networks, 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), 2021, pp. 788–794. https://doi.org/10.1109/ICC MC51019.2021.9418013 21. M. Mounica, R. Vijayasaraswathi, R. Vasavi, Detecting Sybil attack in wireless sensor networks using machine learning algorithms. IOP Conf. Ser.: Mater. Sci. Eng. 1042, 012029 (2021) 22. R.V. Saraswathi, L.P. Sree, K. Anuradha, Support vector based regression model to detect Sybil attacks in WSN. Int. J. Adv. Trends Comput. Sci. Eng. 9(3) (2020)
Chapter 8
Classification of Skin Diseases Using Ensemble Method D. N. Vasundara, Swetha Naini, N. Venkata Sailaja, and Sagar Yeruva
1 Introduction Skin is the most important part of the human body. Protection of our body from heat, burns, UV radiations and other harmful radiations is done by skin, it also helps in producing vitamin D. In maintaining the temperature of the body, skin plays an important role. So, it is very essential to maintain good healthy skin to shield our body from skin infections. They will discover new, valuable and potentially life-saving information as medical industries apply data mining to their existing data. The method of extracting or mining information from large volumes of data is data mining. In data mining, in order to derive data patterns, intelligent methods are applied. The growing amount of medical research calls for computer-based approaches to the extraction of valuable knowledge to be processed and conventional methods will not do so. Data mining is a big opportunity to help doctors work with this vast volume of knowledge. In medicine, machine learning algorithms are commonly used. Various machine learning algorithms have been developed for different disease diagnosis to get higher accuracy in predicting diseases. After analyzing the different characteristics of the disease, several machine learning algorithms are generated to predict different forms D. N. Vasundara · S. Naini (B) · N. Venkata Sailaja · S. Yeruva Department of CSE, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India D. N. Vasundara e-mail: [email protected] N. Venkata Sailaja e-mail: [email protected] S. Yeruva e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_8
79
80
D. N. Vasundara et al.
of disease at the early stages. For kidney disorders, thyroid disease, erythematosquamous diseases, diabetes, breast cancer, other cancer and many more, these algorithms are widely applicable. We selected the erythemato-squamous disease for study in this paper. Here, we apply various classification algorithms, and ensemble method is then applied.
2 Related Work This section consists of different approaches which are useful in prediction and classification of different skin diseases. Ravichandran et al. [1] give description on automatic detection of erythematosquamous diseases based on the fuzzy extreme learning machine. Fuzzy Extreme Learning Machine is a new approach introduced in this paper. The dataset was preprocessed to get fuzzy input values. In combination of fuzzy logic and ELM, the system got more accurate results with increased performance. The total classification accuracy is 93%. Manjusha et al. [2] give detailed information about prediction of different dermatological conditions using naïve Bayesian classification, here the proposed system allows obtaining data patterns by using naïve Bayesian theorem. Here, the data is collected from tertiary health care centers from various areas of Kottayam and Alappuzha, Kerala, India. Eight diseases are Scarlet fever, Rubella, Measles, Fifth disease, Chicken pox, Entrovirus, no vaccination subitum, Kawasaki. Rambhajani et al. [3] in this system Bayes net, a Bayesian technique along with best first search feature selection has been applied on dermatology dataset. This system is composed with 3 steps: data preparation, model development, data validation. In first step, data is fed into the model and the attribute selection is done. Training and testing data are in the ratio of 60:40. Next classification model is developed with desired accuracy. Finally, the performance is measured using error measures like accuracy, sensitivity and specificity. Accuracy obtained is 99.31%. Parikh et al. [4] the author describes two predictive models using popular techniques like Artificial neural network and Support vector machine. The dataset is collected from Department of Skin & V.D., Shrikrishna Hospitals, Karamsad, Gujarat, India. In this the dataset is randomly divided into 2 partitions, i.e., 80–20%, 70–30%. 80 and 70% data for training and 20 and 30% data for testing. This system gives better results by taking 80% of training data and 20% of testing data. Badrinath et al. [5] works with efficient techniques on AdaBoost and its hybrid classifiers for an automatic detection of erythemato-squamous diseases. Here, the dataset is divided into two sets one is fuzzy features and other is non-fuzzy features. The data preprocessing includes the sequence of following steps: feature selection, applying association rules and Apriori algorithm. Here real, modest and gentle AdaBoost algorithms are used for prediction of diseases. The overall accuracy is 98.6% after using data preprocessing the overall accuracy is increased to 99.3%.
8 Classification of Skin Diseases Using Ensemble Method
81
Manjusha et al. [6] in this study data mining techniques like rule based, Decision Tree, Naïve Bayes and Artificial neural network to classify massive amount of healthcare data. Weka 3.7.9 is used for analyzing the data. Weka supports data mining tasks like data preprocessing, classification, clustering, association rules, visualization and feature selection. Naïve Bayesian and j48 decision tree are used to perform mining and classification. Idoko et al. [7] designed FNN system which aimed to differential diagnosis of erythemato-squamous diseases. The dataset used in this study is maintained by Nilsel Ilter and Altay Guvenir. The FNN algorithm is applied for classification. Normalization of input data helped in decrease in training time and quick training for input– output. Clustering technique is used for feature extraction. The accuracy of this system was 98.37%. Verma et al. [8] this system presents a new method by applying six data mining techniques to predict the classes of diseases. The six data mining techniques are Passive aggressive classifier, Linear discriminant analysis, Radius neighbors classifier, Bernoulli naïve bayesian, Gaussian naïve bayesian, Extra tree classifier. The dataset used here is taken from University of California-Irvine machine learning repository. The accuracy obtained is 99.68%. Badhula et al. [9] in this paper to classify three skin diseases five machine learning algorithms are used, i.e., Logistic regression, kernel SVM, Naive Bayes, random forest and CNN. The three kinds of diseases are acne, lichen planus and sjs ten. Here, 80% of the dataset is used for training and remaining 20% is for testing. Each algorithm runs 10 times on the same dataset. The parameters obtained for 5 algorithms are compared to know which algorithm is best for skin disease prediction. This paper concludes that CNN gives the best training and testing accuracy of 99.05 and 96%. From the above references we conclude the summary of references below: Paper doesn’t conclude by saying the patient has a particular disease, it just predicts the percentage of having 8 diseases which they choose. In some papers, the system is slow to work for larger datasets, because the required training time is higher. In some papers, the model can be of efficient use in detection with improved speed and accuracy. In some papers the system is sensitive to noisy data and outliers. In a paper, it doesn’t guarantee that it performs well in all cases, if more instances are added. The accuracy of the system differs by the change in percentage of training set data and testing set data. Compared to other techniques such as machine learning algorithms, Data mining, Ensemble method is the preferred option as performance and accuracy increases. Ensemble method works well for prediction of skin diseases.
3 Proposed Work The main goal of proposed work is to classify seven different classes using three data mining techniques. Support vector machine, decision tree, random forest are the three data mining techniques applied in this model. Voting is used in this ensemble
82
D. N. Vasundara et al.
approach to predict the classes of skin diseases. The seven different classes are Pityriasis rubra, Lichen planus, Rosea pityriasis, Healthy skin, Psoriasis, Chronic dermatitis, Seborrheic dermatitis. To overcome the disadvantages like increased training time here in this proposed system feature selection is used. The accuracy may also increase with the use of feature selection method. In this paper, we have used various data mining techniques and also an ensemble approach for the diagnosis purpose and are described below: Decision tree: Decision tree is a tree-structured model. The internal nodes in the decision tree represents attributes of the dataset, branches represents decision rules and every leaf node represents the outcome. Based on the features of the selected dataset the tests are performed in decision tree. Random Forest: Samples are taken with the replacement of the training dataset, but the trees are designed in a way that decreases the association between the individual classifiers. Specifically, for each split, instead of eagerly selecting the best split point in the tree construction, only a random subset of features is considered. Support Vector Machine: SVM is able to make a binary decision for an input dataset and determine which input sample categories belong to. The SVM algorithm is trained to mark input data into categories that are divided between categories by the widest possible range. Ensemble Algorithm: The k parts of training data are fed into SVM, RF and DT as input to create k models of each algorithm from ensemble model (Voting). Final ensemble models generated from algorithm are used for output.
4 Implementation This model includes three stages: 1. 2. 3.
Dataset preparation Preprocessing and Feature selection Model implementation The architecture of proposed work is as shown in Fig. 1.
4.1 Dataset Preparation The dataset we used here is Dermatology dataset [10]. This dataset was collected from UCI repository. This dataset contains 407 instances and 34 attributes. This 34 attributes includes 12 clinical attributes like scaling, itching, polygonal papules, oral mucosal involvement, scalp involvement, age, erythema, definite borders, Koebner phenomenon, follicular papules, knee and elbow involvement, family history and 22
8 Classification of Skin Diseases Using Ensemble Method
83
Fig. 1 Architecture of proposed model
histopathological attributes like eosinophils in the infiltrate, fibrosis of the papillary dermis, acanthosis, parakeratosis, elongation of the rete ridges, spongiform pustule, focal hypergranulosis, vacuolization and damage of basal layer, saw-tooth appearance of retes, perifollicular parakeratosis, band-like infiltrate, melanin incontinence, PNL infiltrate, exocytosis, hyperkeratosis, clubbing of rete ridges, thinning of suprapapillary epidermis, Munro microabscess, disappearance of granular layer, spongiosis, follicular horn plug, inflammatory mononuclear infiltrate. This dataset consists of seven classes. Pityriasis rubra, Lichen planus, Rosea pityriasis, Healthy skin, Psoriasis, Chronic dermatitis, Seborrheic dermatitis. There are 110 psoriasis records, 61 seborrheic dermatitis records, 71 lichen planus records, 48 pityriasis rosea records, 57 chronic dermatitis records, 20 pityriasis rubra pilaris records, 40 healthy skin. In Fig. 2, X-axis consists of classes and Y-axis consists of number of records.
84
D. N. Vasundara et al.
Fig. 2 Dataset classes and records bar plot graph
4.2 Preprocessing and Feature Selection In preprocessing, the data is interpolated for missing values and normalized. The normalized data is classified into 2 one is training dataset and the other is testing dataset. Then, the training dataset is divided into k parts to train the model. Feature selection is a mechanism in which those features are only selected automatically from dataset which contribute more in prediction of diseases [11]. By reducing the number of features in the model and trying to improve model performance, feature selection helps to prevent both of these problems. Feature selection has an added advantage in this case: model interpretation. With fewer features, the output model becomes simpler and easier to understand, and an user is more likely to trust the model’s future predictions.
4.3 Model Implementation Model implementation is an important part as we use data mining techniques and ensemble algorithm for prediction. The dataset used here is with 33 independent attributes and one dependent attribute is output variable, which predicts whether the patients are having either psoriasis or lichen planus or seborrheic dermatitis or pityriasis rubra or chronic dermatitis or rosea pityriasis or healthy skin. In this project, we have used various data mining techniques like Random Forest, Decision tree and Support vector machine and also an ensemble approach for the diagnosis purpose. After training, the algorithms with training dataset. Test data samples are injected as input to the 3 data mining algorithms. Here, the results are predicted by each
8 Classification of Skin Diseases Using Ensemble Method
85
Table 1 Performance of various classification algorithms S. No.
Decision tree (%)
Random forest (%)
Support vector machine (%)
Ensemble algorithm (%) 97.39
1
Precision
93.50
95.64
92.79
2
Recall
92.30
93.69
90.56
96.71
3
F-measure
92.73
94.54
91.45
97.02
4
Accuracy
94.47
95.09
92.63
96.93
algorithm in system. Later on using voting approach majority ranking of the above predictions are considered as the final output.
5 Results The experimental phase has been classified into 3 stages. Training the system using dermatology dataset, Classification of 7 different classes like Pityriasis rubra, Lichen planus, Rosea pityriasis, Healthy skin, Psoriasis, Chronic dermatitis, Seborrheic dermatitis and Calculation of precision, recall, F-measure and accuracy based on test predictions.
5.1 Performance of Various Classification Algorithms As shown in Table 1, the performance of ensemble algorithm is higher than the single classifiers. So, ensemble algorithm is more suitable for classification of 7 skin diseases. The highest accuracy in this paper is 96.93%, which is acquired using ensemble algorithm.
5.2 Performance Comparison In Fig. 3, X-axis consists of various algorithms and Y-axis consists of percentages which measures percentages of accuracy, F1-score, precision, recall.
6 Conclusion In the biotechnology sector, machine learning algorithms plays major part in disease diagnosis. Many of the expert systems which helps in prediction of different diseases
86
D. N. Vasundara et al.
Fig. 3 Performance comparison graph
are developed using knowledge in machine learning algorithms. This paper discusses data mining techniques for the prediction of skin diseases. Decision tree, Random Forest, Support vector machine and voting technique are used to build the system. Feature selection process is also used to overcome increased training time and to improve accuracy. By using ensemble algorithm, we achieved highest accuracy when compared to random forest, decision tree and support vector machine. In this system, we also suggest some home remedies to the predicted disease.
References 1. K.S. Ravichandran, B. Narayanamurthy, G. Ganapathy et al., An efficient approach to an automatic detection of erythemato-squamous diseases. Neural Comput. Appl. 25, 105–114 (2013) 2. K.K. Manjusha, K. Sankaranarayan, P. Seena, Prediction of different dermatological conditions using naïve Bayesian classification. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1) (2014) 3. M. Rambhajani, W. Deepanker, N. Pathak, Classification of dermatology diseases through Bayes net and best first search. Int. J. Adv. Res. Comput. Commun. Eng. 4(5) (2015) 4. K. Parikh, T. Shah, R.K. Kota, R. Vora, Diagnosing common skin diseases using soft computing techniques. Int. J. Bio-Sci. Bio-Technol. 7, 275–286 (2015). https://doi.org/10.14257/ijbsbt. 2015.7.6.28 5. N. Badrinath, G. Gopinath, K.S. Ravichandran et al., Estimation of automatic detection of erythemato-squamous diseases through AdaBoost and its hybrid classifiers. Artif. Intell. Rev. 45, 471–488 (2015) 6. K.K. Manjusha, K. Sankaranarayanan, Comparative study of data mining algorithms in medical data. Int. J. Eng. Res. Technol. (IJERT) NSDMCC 4(06) (2017)
8 Classification of Skin Diseases Using Ensemble Method
87
7. J. Idoko, M. Arslan, R. Abiyev, Fuzzy neural system application to differential diagnosis of erythemato-squamous diseases. Cyprus J. Med. Sci. 90–97 (2018). https://doi.org/10.5152/ cjms.2018.576 8. A.K. Verma, S. Pal, S. Kumar, Comparison of skin disease prediction by feature selection using ensemble data mining techniques. Inform. Med. Unlocked 16, 100202 (2019). ISSN 2352–9148 9. S. Badhula, S. Sharma, S. Juyal, C. Kulshrestha, Machine learning algorithms based skin disease detection. Int. J. Innov. Technol. Expl. Eng. (2020) 10. Dataset. https://archive.ics.uci.edu/ml/datasets/dermatology 11. N.V. Sailaja, L.P. Sree, N. Mangathayaru, Rough set based feature selection approach for text mining, in 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) (2016)
Chapter 9
An Integrated Decision Support System for Storm Surge Early Warning Using SOA J. Padmanabham, P. L. N. Murty, T. Srinivasa Kumar, and T. V. S. Udaya Bhaskar
1 Introduction Storm surges induced coastal flooding or inundation is one of the potential hazardous elements which will have significant impact on any coastal zone. Coastal flooding results in the death and destruction of people and property, as well as the destruction of coastal infrastructure. Tropical storms (cyclones, hurricanes, and typhoons) have killed about two million people worldwide in the last two centuries, and millions have been injured [1]. About 3 lakh lives were lost in Bangladesh due to 1970 cyclone 1977 Andhra cyclone resulted the loss of 10,000 lives [2]. India has the large coastal stretch of about 7500 km and 30% of the population resides along this stretch. On an average, every year about 5–6 cyclones are making their landfall along the Indian coasts that leads to significant socio-economic loss. Murty et al. [3] states that the high-coastal population density needs an efficient storm surge warning system. The importance of a surge warning system for the Indian coasts is highlighted in studies by Bhaskaran et al. [4] and Srinivasa et al. [5]. The Indian National Centre for Ocean Information Services (INCOIS) have taken on the task of establishing a storm surge warning system, recognizing the importance of such a system for Indian coasts (SSEWS) [3].
J. Padmanabham (B) · P. L. N. Murty · T. Srinivasa Kumar · T. V. S. Udaya Bhaskar Indian National Centre for Ocean Information Services, INCOIS, Hyderabad 500090, India e-mail: [email protected] J. Padmanabham Department of Geoinformatics, Mangalore University, Mangalore, Karnataka 574199, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_9
89
90
J. Padmanabham et al.
2 Overview of Architecture The layer diagram in Fig. 1 depicts a hierarchical structure for describing the fundamental functions in the Storm Surge (DSS) utilizing SOA. Each of the seven layers has components that connects to the other layers. The connections between modules/layers are purely based on the functional requirements of the DSS and interfacing to achieve the targeted goals of each module. The topmost layer represents the Service APIs and user interface, interacts with the user directly. A decision support system tool has the dedicated access to the available resources, user interfaces, visualization interfaces, and GIS data and associated software libraries and the numerical model. The components are designed to perform some of the critical workflows for warning generation and dissemination. The application layer contains the storm surge numerical model Advanced Circulation (ADCIRC) model (Luettich et al. 1992) that computes, predict and analyze storm surge phenomena associated with the tropical cyclone. This layer is responsible for the data extraction, interpretation, maps visualization and workflow settings. The workflow tools coordinate the interaction of the features/modules in this layer.
Fig. 1 Service oriented architecture layer diagram
9 An Integrated Decision Support System for Storm Surge …
91
The application server layer is a server framework that comprises a comprehensive service standard that includes a collection of components that are accessible to the application via a platform-specific API. The Tomcat server acts as an application server that runs on the Java technologies to serve the application and enable the workflows to coordinate with application components, tools, and physical resources in the layer of the resource access. The Operating system and run time layer is an abstraction layer that de scribes the software/instructions that need to be executed, whilst the programme is executing. This layer has the access to a dedicated high-performance computing (HPC) facility to launch the ADCIRC model in real time with use of available nodes and processers. The Data layer is the core of the decision-making tool to analyze the impending hazard due to the storm surge and extent of inundation across the coast. It acts as a data resource provider to access the topography, bathymetry, coastal tide gauge network data and moored buoys data to provide the situational awareness to the operator. It provides the access to central database hosted on MySQL and geo databases for better visualization on the GIS map display and associated visualization. The resources and system monitoring-access control layers provide the dynamic resources for storage, computing, and network connectivity in a distributed environment. It has a host of data transport protocols, web services with REST capabilities, model grid services and secure connections to the database resources and computing facilities. The DSS modules interact with various layers and services. It enables the operator to perform multiple tasks such as generating cyclone best Track, launching the model, visualization, map display, monitoring the storm surge event, producing the advisory bulletins, and disseminating to the disaster management stakeholders for further action. It also facilitates the cataloguing of the events and security aspects.
3 Standard Operating Procedure (SOP) The real-time storm surge warning procedure begins once the cyclone track and as associated intensity issued by the India Meteorological Department (IMD). DSS will take this track and intensity parameters to generate the wind and pressure fields to force storm surge model. Jelesnianski and Taylor [6] wind model is integrated in DSS to compute wind and pressure fields. The applicability of this wind model in 4storm surge computations is explained in many studies [2, 5, 7, 8]. The next step is to launch the storm surge model at high-performance computing (HPC) system. The next step is to create maps and information based on the model’s output. Finally, this data is sent via fax, e-mail, SMS, and other means. The entire process is repeated at a rate that corresponds to the frequency with which IMD update the track forecasts. The Standard Operation Procedure (SOP) is detailed in Fig. 2.
92
J. Padmanabham et al.
Fig. 2 Storm surge early warning SOP flow chart
4 Decision Support System (DSS) DSS is the backbone of the Storm Surge services. DSS is the codified version of SOP which is the base algorithm in implementation. DSS is developed to lessen the requirement for human intervention and eliminate possible errors in the model inputs and outputs due to pre- and post-processing of data. It is created as a standardsbased tool with high level of modularity for determining the best decision-making procedure. The DSS is built based on SOA architecture standards [1]. Implemented the OGC standard services to support the access of GIS data, e.g., Web Map Services and Web Feature Server Specifications (WFS) for the visualization of the situation due to storm surge event. The temporal data of the sea level sensors such as tide gauges and moored buoys are accessed over REST API calls. The model simulation system identifies the affected areas and is integrated via a geo spatial engine (Fig. 3). DSS can lead to access and interpret the model output for further generation of spatial maps and dissemination. DSS was automated from the generation of input to wind model to till the bulletins generation. DSS includes Track Generation, Model Launch, Map display, and bulletin generation and dissemination modules. Track generation module is to generate the model input track in best track format from IMD e-mail advisory further used to generate pressure and wind fields and prepare model inputs required to launch the model. The Map Display module allows operator to visualize spatial data that has been combined with GIS and surge model outputs. It makes use of feature classes (data subsets) and tables from geospatial datasets. Various processing tools are included in this module such ARCGIS, ArcPy, Java, Python libraries to generate the spatial data and maps from the ASCII model outputs.
9 An Integrated Decision Support System for Storm Surge …
93
Fig. 3 Schematic architecture of decision support system using SOA with necessary computing services, sea level sensor systems, Geo Database repositories, and dissemination channels
Storm surge alert bulletins and notification messages are generated using the ‘Bulletin generation’ module. Output includes web-based bulletins, pdf, spatial GIS data sets, and threat images, etc. Warning dissemination module is designed to deliver the notification messages through various dissemination modes such as E-mail, SMS, Fax and web publish. These notification messages are sent to the National, state and district authorities through the dissemination rules and framework of the organization as per the SOP standards. Storm Surge Service Bus (SSSB) is the main medium through which the endto-end operations of the DSS are executed and resources are completely managed and maintained through this service bus. The service bus includes an access to the sea level network sensors (Tide Gauge & Moored Buoy) and their associated time series data for the validation of model output. SSSB also facilitates the connectivity to the model simulation service where the computing facility such as HPC clusters are made available for the real-time model launch and get the model results over secured transport protocols. SSSB also facilitates the geospatial data services to DSS and serves the spatial data over WMS, WFS and WPS protocols for the visualization and analysis storm surge forecasts. SSSB users and information management service enables to maintain the dissemination lists, operator access controls and logging procedures. It connects to different gateways of E-mail, SMS and fax solutions via https and REST API calls
94
J. Padmanabham et al.
Fig. 4 Dashboard of the decision support system
for pushing the notification messages to the stakeholders with standard formats and tem Plates. The resource layer in the DSS architecture ensures the availability of the computing systems, servers, networking and databases for smooth operations of the early warning centre (Fig. 4).
5 Results and Discussion DSS computes the real-time storm surge and inundation for the cyclones that made landfall along the Indian coasts. Figure 5 shows the forecasted storm surge along with inland inundation due to Yaas cyclone based on the track forecast issued by IMD at 25 May 2021 at 0230 IST. From the figure, it is clearly found that the vast coastal stretch is experienced of storm surge amplitudes greater than 1 m and reaching its peak value of 3.3 m at Baleswar, Odisha. The storm surge heights and associated inland inundation extents at various coastal locations of the Odisha and West Bengal coasts are tabulated in Table 1. It can be observed from the table that the forecasted maximum storm surge of about 3.3 m at Baleswar and the associated inland inundation extent reaches its peak of about 7.5 km at Mohanpur. It is well known that the inland inundation extent primarily depends on the nature of coastal topography along with surge amplitude.
9 An Integrated Decision Support System for Storm Surge …
95
Fig. 5 DSS generated storm surge map for the recent very severe cyclone “Yaas”
Table 1 Storm surge heights and associated inundation extents due to Yaas cyclone Mandal/taluk
District
State/union territory
Nearest place of habitation
Storm surge (m)
Expected inundation extent (km)
Baleshwar
Baleshwar
Odisha
Kumbhirgari
1.2–3.3
Up to 2.00
Bhadrak
Bhadrak
Odisha
Mohanpur
1.0–2.9
Up to 7.50
Odisha
Tikayat Nagar
0.2–1.4
Up to 2.64
Kendraparha Kendrapara Basirhat
North 24 Parganas
West Bengal
Amlamethi
0.2–0.7
Up to 0.26
Diamond Harbour
South 24 Parganas
West Bengal
Chakloknath
0.2–0.6
Up to 0.78
Tamluk
Purba Medinipur
West Bengal
Jamitta
0.2–0.5
Up to 0.35
6 Conclusion In this paper, we have presented an integrated decision support system tool that built on service oriented architecture for storm surge early warnings. SOA built as
96
J. Padmanabham et al.
a service-centric architectural approach that supported integrating various heterogeneous datasets, sensors, systems and associated processes by creating reusable components of functionality, or services. This approach enabled the DSS to be more interoperable, modular and robust in handling the storm surge events that are associated with cyclonic storms. The SOA also made it possible to add additional plugins/applications/modules and associated features without hampering the core eco system. The DSS was made operational and it handled about 40 cyclonic storms and provided the timely and accurate storm surge advisories to the stakeholders.
References 1. H. Ubydul, M. Hashizume, K.N. Kolivras, H.J. Overgaard, B. Das, T. Yamamoto, Reduced death rates from cyclones in Bangladesh: what more needs to be done. Bull. World Health Organ. 90, 150–156 (2012). https://doi.org/10.2471/BLT.11.088302 2. S.K. Dube, I. Jain, A.D. Rao, T.S. Murty, Storm surge modelling for the Bay of Bengal and Arabian Sea. Nat. Hazards 51, 3–27 (2009). https://doi.org/10.1007/s11069-009-9397-9 3. P.L.N. Murty, J. Padmanabham, T. Srinivasa Kumar, N. Kiran Kumar, V. Ravi Chandra, S.S.C. Shenoi, M. Mohapatra, Real-time storm surge and inundation forecast for very severe cyclonic storm ‘Hudhud.’ Ocean Eng. 131 (2017) 4. P.K. Bhaskaran, R. Gayathri, P.L.N. Murty, S. Bonthu, D. Sen, A numerical study of coastal inundation and its validation for Thane cyclone in the Bay of Bengal. Coast. Eng. 83, 108–118 (2014). https://doi.org/10.1016/j.coastaleng.2013.10.005 5. T. Srinivasa Kumar, P.L.N. Murty, M. Pradeep Kumar, M. Krishna Kumar, J. Padmanabham, N. Kiran Kumar, S.C. Shenoi, M. Mohapatra, S. Nayak, P. Mohanty, Modeling storm surge and its associated inland inundation extent due to very severe cyclonic storm Phailin. Mar. Geodesy 38(4), 345–360 (2015). https://doi.org/10.1080/01490419.2015.1053640 6. C.P. Jelesnianski, A.D. Taylor, NOAA Technical Memorandum. ERL WMPO-3 (1973), p. 33 7. S. Dube, A. Rao, P. Sinha, T.S. Murty, N. Bahulayan, Storm surge in the Bay of Bengal and Arabian Sea: the problem and its prediction. Mausam 48 (1997) 8. P.L.N. Murty, P.K.Bhaskaran, R. Gayathri, B. Sahoo, T. Srinivasa Kumar, B. SubbaReddy, Numerical study of coastal hydrodynamics using a coupled model for Hudhud cyclone in the Bay of Bengal. Estuar. Coast. Shelf Sci. 183 (2016)
Chapter 10
Object Tracking and Detection Using Convolutional Neural Networks C. N. Sujatha, P. Sahithi, R. Hamsini, and M. Haripriya
1 Introduction In recent years, autonomous/self-driving automobiles have drawn a great deal of interest as a subject of research for each academia and industry. For a car to be simply autonomous, it ought to feel of the environment via which it’s miles riding. The autonomous vehicle ought to be capable of each localize itself in an environment and identify and preserve track of objects (moving and stationary). Object detection deals with identifying and locating an object in an exceeding situation. Object detection has always been an amazing subject for large-scale research. The main purpose of object detection is to detect and classify every object in the image. It has been widely used in self-driving cars, robot vision, medical imaging, crowd or people counting, pedestrian detection and even for visually impaired people, it can detect the person in front and warn them accordingly. Object tracking deals with locating a moving object or multiple objects over a period of your time, for instance, traffic flow analysis, audience flow analysis, etc. A tracker assigns consistent labels to the tracked objects across multiple video frame. This paper deals with Object detection and tracking with the assistance of Machine learning techniques. Machine learning and image processing for face recognition, object detection, pattern recognition etc., were implemented using python. Many sophisticated image processing approaches use Machine Learning Models including Deep Neural Networks to transform images for many tasks, such as adding creative filters, optimizing an image for optimum output, or improving precise image information for computer vision tasks. The method uses a single end-to-end qualified neural network that takes a photograph as input and explicitly predicts bounding boxes and class marks for each bounding box. Although the technique runs at 45 frames per second and up to 155 frames per second for a speed-optimized variant C. N. Sujatha (B) · P. Sahithi · R. Hamsini · M. Haripriya ECE, Sreenidhi Institute of Science and Technology, Ghatkesar, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_10
97
98
C. N. Sujatha et al.
of the algorithm, it has poorer predictive performance. YOLO v3 was implemented here instead of YOLO due to its limitation and drawbacks such as YOLO is not always good in detecting small objects and it has a hard time with objects which are close together, i.e., grouped objects. Mainly YOLO v3 use multi label classification to classify object in images or videos. There are three main important benefits to use YOLO for object detection over the traditional methods. First and foremost, YOLO is extremely fast. To predict detections, we simply run our neural network on a new picture at test time, for less than 25 ms of delay. Second, when making predictions, YOLO considers the picture as a whole. Unlike sliding window and area proposal-based methods, YOLO sees the entire picture through preparation and testing, so it encodes qualitative detail about classes as well as their presentation indirectly. Since it can’t see the wider picture, Fast R-CNN, a top detection tool, misidentifies background patches in an image as artifacts. YOLO outperforms top identification methods like DPM and R-CNN by a broad margin when trained on real photographs and checked on artwork. YOLO is less likely to break down when introduced to new domains or unpredictable stimuli because it is strongly generalizable. COCO is a large-scale image data collection that uses Common Objects in Context (COCO) to recognize, segment, and label objects. COCO features 1.5 million object instances spread over 80 different categories. COCO uses 5 annotation types and stores annotations in JSON file.
2 Literature Survey In Object detection and tracking using deep learning and artificial intelligence, it is stated that the consideration of Artificial Intelligence to address Computer vision assignments has outflanked the picture preparing approaches of dealing with the undertakings. The CNN model prepared to on street vehicle dataset for single-article recognition. The high approval exactness is a direct result of tremendous measure of information on which it is depleted from each class. Execution measurements are organized for day, evening and NIR pictures. Various article location is executed utilizing YOLOv3 for KITTI and COCO dataset. Execution measurements are classified for YOLOv3 on thought about classes of pictures. Higher the precision worth of class more prominent will be mAP (mean Average Precision) esteem. The mAP esteem relies upon picture picked for computation. IoU (Intersection over esteem) of 0.5 is ideal for location [1–4]. Tuan Linn Bang et al. stated that second commitment of this paper concerns the working velocity. In the first engineering, the Deep SORT possibly tracks the article when the YOLO discovery is wrapped up. In this, any recognized article from YOLO is sent quickly to the Deep SORT identification. The Dlib is additionally carried out in the numerous interaction approach where each object is followed by one CPU measure. Likewise, the article shows up in a few casings. Thus, the article identification in the proposed design is directed in certain edges just with the goal that we can lessen the YOLO identification time. Trial results have exhibited that the
10 Object Tracking and Detection Using Convolutional Neural Networks
99
proposed engineering decreased the number of personality switches contrasted and the first methodology [5]. A. Vidyavaui et al. proposed another profound sort yolov3 engineering which defeats the issues of unique profound sort yolov3 design by adding two commitments. The primary commitment identifies with personality switches. YOLO object location could miss the article so that no distinguished jumping boxes will be sent to the Deep SORT segments which prompts no following outcomes from it. At the point when the YOLO distinguishes again these articles in ensuing casings, the items will be doled out another item ID. The Dlib tracker is utilized to take care of this issue since it needs just a jumping box of an object in the principal outline. From that point forward, the Dlib can follow the article outline by outline without the YOLO part. Indeed, even YOLO can’t distinguish any article, the proposed design [6]. Milan et al. presented an object detection method for a stereo-camera-based 3D multiple vehicles tracking system that utilizes Kalman filtering to enhance robustness is proposed. The most objective of that system is to accurately predict locations and orientation vehicles from stereo camera data. It consists of various modules. An autonomous vehicle can plan a path and prevent crashes by detecting vehicles and other obstacles. However, tracking 3D objects and rotations may be a notoriously difficult problem. Object tracking is most typically solved by LIDAR, which is accurate but also very expensive. However, visual 3D detection comes with a serious challenge: precision rapidly decreases with increasing distance from the cameras. As a result, the bounding boxes generated by most image-based 3D object detection systems tend to leap around between frames, leading to an unstable tracking. Some camera-based detections are dropped entirely, resulting in tracking loss [7–11].
3 Work Flow Object detection and tracking are two of the most common and difficult tasks that a surveillance system should do in order to identify relevant events and suspicious actions, as well as automatically annotate and extract video information. Adapting state-of-the-art, open source, and free resources to customized challenges is one of the most significant barriers to the development of new applications. Figure 1 depicts the general Object detection block diagram for ML involves acquiring the problem statement considering them and collecting all the data required to solve the problem.
Model Inference
Data Collection
Data Preparation
Fig. 1 Block diagram of ML for object detection
Model Training
Problem Statement
100
C. N. Sujatha et al.
Assemble and arrange all the data accumulated. Before predicting the output as a single-numerical score, teach and train the model from the data collected. The algorithm of YOLOv3 is shown in Fig. 2 which is implemented in IDLE environment using Python language. “yolov3.weights” which consists of pre-trained network’s weights, “yolov3.cfg” which consists network configurations and COCO Names” file which contains 80 different class names used in COCO dataset are downloaded. Import all the necessary packages. The YOLOv3 algorithm creates bounding boxes for the predicted outputs. A confidence score is assigned to each predicted box. YOLO object detector is loaded which was trained on COCO dataset. Here, OpenCV loads darknet as it is a pre-trained model for 80 different classes. Darknet is open-source neural network, it has 53 in-built layers and stacks up 53 more layers for the task of detection, which ultimately gives us 106 layered convolutional underlying architecture for YOLOv3. Read the input image with the help of OpenCV and take note of its spatial dimensions. For video after initializing, the total number of frames is determined and Download the model Initialize the parameters Load the model & classes Read input Process the image
Get names of the output layers
Blob Processing
Post Process the network’s output Non-maximum Suppression Draw the predicted boxes Load the output image Fig. 2 Work flow for object detection
10 Object Tracking and Detection Using Convolutional Neural Networks
101
each frame is processed with the help of “VideoWriter” and cv2.VideoCapture to seize the video. The neural network accepts only certain format of the input image termed as BLOB (Binary large objects). After an input image is read, pass it through blobFromImage function in order to convert it into the blob format for neural network. It uses a scale factor of 1/255 to scale the image pixel values to a target range of 0 to 1. The swapRB parameter should be set to 1. To identify the last layer of the network, as we run through the whole layer because forward function in OpenCV’s net requires the last layer up to which it should run in the whole network. The function getUnconnectedOutlayers() grants the names of all unconnected output layers after which should run the forward pass of the network. Scan through all the bounding boxes and eliminate the one’s with low confidence scores. After which, the predicted box’s class label will be assigned as per the class with highest score. Non-maximum suppression is used for certainty in bounding boxes. Many bounding boxes will be formed around an object to choose the most accurate bounding box also called as ground truth box, it eliminates the redundant overlapping boxes with lower confidence scores. After acquiring the ground truth box, finally draw the filtered boxes on the input frame with their respective class label and confidence scores. After drawing the box, the predicted image is loaded and shown with the help of computer vision library. For video, open video writer to save the frames with detected output bounding boxes. Pass the “Fourcc” code to specify the video codec. Attain frame shapes and codes after which we write the output to Video Writer and release the file pointers. Video Writer automatically calls this function it closes the opened video. Finally, the predicted output will be displayed.
4 Simulation Results This section demonstrates the results, confidence scores of objects and execution time for all the sample images and videos. Yolov3 results with intact bounding boxes and COCO dataset class labels have been acquired. As for the sample images to prove the algorithm we have considered various traffic scenarios, i.e., images with heavy and minimal objects, bulky and quiet videos. Yolov3 algorithm has been implemented in IDLE environment of Python. Generally, it consists of 80 classes with a total of 328 K images with a group of images of persons and things. Each group contains 164 K images that are further split into 82 K for training, 41 K for validation and 41 K for test sets. In addition to that, a new dataset has been collected with 100 images for testing purpose, and some of the images have been presented in the result section for our understanding.
102
C. N. Sujatha et al.
4.1 Object Detection in Images For object detection in images, we passed down various image formats jpg and png. The images shown in Fig. 3a, b, e and f are in png format and c, d, g and h are in jpg format. Figure 3a depicts one car which is clearly visible, another car and truck are present in the right corner of the image. Figure 3b displays two buses evidently visible, a motorbike and a person are present in bottom corner of the image. The resultant object predictions for Fig. 3 are displayed in Fig. 4, respectively. Figure 4a depicts two cars and a truck with bounding box and class labels. Figure 4b displays two buses evidently visible, a motorbike and a person with bounding box and class label. GHMC worker on duty 4c portrays three noticeable persons, merely visible person and a chair with bounding box and class label. Rider image 4d represents a person and a motorbike with bounding box and class label. Figure 4e details every person with motorcycle and local buses with bounding box and class label. Figure 4f displays every person with motorbikes and cars with bounding box and class label. Figure 4g depicts multiple random vehicles with bounding box and class label. Figure 4h demonstrates every object in the busy image with bounding boxes and class labels. The confidence scores of the Resultant images, i.e., Fig. 4 is tabulated in Table 1. As there are fewer objects in Fig. 4a–d the confidence scores of all objects detected are presented in Table 1. Abundant number of similar objects are present in Fig. 4e–h, so the average confidence score of same class label is displayed in Table 1. Table 1 demonstrates the confidence scores and class labels for objects detected in Fig. 4; the confidence scores are better for objects which are completely in frame as shown in Table 1. GHMC worker image shown in Fig. 4c, objects which are merely
Fig. 3 Sample images for object detection a car on a highway; b RTC bus crossing the road; c GHMC worker on duty; d rider image; e motorcycles and local buses; f motorbike and cars; g random vehicles and persons; h heavy traffic with cars, trucks and buses
10 Object Tracking and Detection Using Convolutional Neural Networks
103
Fig. 4 Objects identified in sample images a car on a highway image; b RTC bus image crossing the road; c GHMC worker on a duty image; d rider image; e motorcycles and buses; f motorbikes and cars; g random vehicles and persons; h heavy traffic with cars, trucks and buses
visible were also detected with good confidence scores and labels. Among the images with few objects like Fig. 4a–d, the execution time for Rider image shown in Fig. 4d is less than the other three, this is due to the certainty or directness of the image. For Fig. 4e–h, there are numerous similar objects in the image and they are very close to each other, for which our algorithm predicted very precisely with class labels and bounding boxes. From the image shown in Fig. 4h, it is observed that the bounding boxes and class labels are predicted accurately even though the objects are close to each other. The execution time for the image shown in Fig. 4g is less among the other three shown in Fig. 4e, f and h. From the results, it is seen that the execution time depends on the preciseness of the image but not on the number of objects in an image.
4.2 Object Detection and Tracking in Videos Object tracking refers to locating the object in a sequence of images or frames. Two sample videos, one sloppy and the other low-key, were used as input to test the object tracking method. Here, we mentioned two frames captured at two consecutive seconds from each video in Fig. 5. Figure 5a and b are from Video 1 and Fig. 5c and d are from Video 2. The two videos are in Mp4 format. Video 1 is about the police chasing a stolen vehicle (Red car). Frame 1 shown in Fig. 5a has two red cars and some black and white cars in motion. Figure 5b shows frame 2 has many black and white cars, the stolen red car in motion. Video 2 was captured on the Highway; we took four cars into consideration in Frames 1 and 2 as
104
C. N. Sujatha et al.
Table 1 Comparison of confidence scores and execution time for Object detection in images Name of the image
Class labels
Confidence scores
Execution time (in seconds)
Car on a highway
Car 1
0.9984
0.503912
Truck
0.8090
RTC bus crossing the road
GHMC worker on duty
Rider image Image with motorcycles and local buses at a traffic signal
Image with motorbike and cars at a traffic signal
Car 2
0.5243
Bus 1
0.9994
Bus 2
0.9967
Motorbike
0.7725
Person
0.5863
Person 1
0.9947
Person 2
0.9958
Person 3
0.8991
Chair
0.6732
Person 4
0.6892
Person 1
0.9938
Bike
0.9996
Person
0.9498
Bicycle
0.7748
Bus
0.9800
Person
0.7784
Motorbike
0.9251
Car Traffic at Hyderabad with Person random vehicles and persons Car Heavy traffic at Hyderabad with cars, trucks and buses
0.495316
0.503093
0.457147 0.683002
0.647876
0.8151 0.8741
0.485714
0.7787
Bike
0.6641
Car
0.8642
Bus
0.8396
Truck
0.5558
0.567044
Fig. 5 Snippets from videos captured at two different seconds a Frame 1 of Video 1; b Frame 2 of Video 1; c Frame 1 of Video 2; d Frame 2 of Video 2
10 Object Tracking and Detection Using Convolutional Neural Networks
105
Fig. 6 Multiple objects tracked in different frames of videos a Frame 1 of Video 1; b in Frame 2 of Video 1; c Frame 1 of Video 2; d Frame 2 of Video 2
shown in Fig. 5c and d to compare confidence scores. The resultant object predictions for Fig. 5 are displayed in Fig. 6, respectively. The process of vehicle tracking has been successfully carried out by the algorithm in both analytical videos. Two consecutive frames of Video 1 and Video 2 are displayed in Fig. 6a–d depicts the stolen red car and many other cars on highway with bounding boxes, class labels and confidence scores. The change in confidence score for the same object in every frame has been observed from Fig. 6, because when the object is in motion there will be a quick change in the angle of camera with which the bounding box may miss covering some part of the object. Here, the camera is focused on the stolen red car, the confidence scores quickly change from 0.9903 to 0.9085 when the car is zoomed into in the Frame 1 as shown in Fig. 6a and when it is zoomed out in the Frame 2 shown in Fig. 6b. For the two frames of Video 2, as Car 1 comes closer to the camera in Frame 2 the confidence score quickly increased from 0.9161 to 0.9399. In frame 2 Car 2 and Car 3 are hidden at the back which is why the confidence score decreased from 0.8740 to 0.7201 and 0.9820 to 0.9293, respectively. Even though the number of frames in Video 2 are greater than Video 1, the execution time is less for Video 2. From the results, the algorithm articulates that the execution time is more for bulky video and less for the video which is quite as shown in Table 2. In comparison with reference [6], the proposed method mainly focused on both image and video sense detection and tracking of an object and moreover, confidence score along with bounding boxes have been displayed. Whereas in [6], they described about video analytics for road traffic without score. Compared with [4] which proposed YOLOv2 based Real Time Object Detection for detection and tracking of an object, YOLOv3 predicts greater bounding boxes for the same input Table 2 Frames and execution time for video tracking Video No.
Number of frames
Time for each frame (in seconds)
Total execution time (in seconds)
Video 1
350
1.1780
412.3094
Video 2
812
0.4913
398.9382
106
C. N. Sujatha et al.
image size. YOLOv3 predicts boxes at three different scales of 13 × 13, 26 × 26, and 52 × 52 with an input of 416 × 416. Compared with the existing methods, in this paper, execution time of the algorithm has been tabulated and also tested for various formats of the images (such as jpg and png) using the same algorithm.
5 Conclusion Yolov3 algorithm is used in this work for object identification and tracking. This approach makes intuitive sense and does result in better tracking. It tracks by using proximity and by visual evaluation of the appearance of objects. Every object is detected with a bounding box and class labels, labeling of classes is done with the help of the COCO dataset. The results obtained for object detection are excellent except for the minor issue in class labeling and occlusion. For improved labeling, the dataset should be trained using vehicles in the nearby vicinity. Object tracking tends to jump around when multiple objects are present which should be improved. Autonomous driving is one of the most fascinating future works which can be done using object detection, recognizing criminals or thieves in crowded places, helping visually impaired people to travel on their own and many more can be applied through object recognition and tracking. However, in 2020, three major versions of YOLO, named YOLO v4, YOLO v5, and PP-YOLO, were published which are very simpler in implementation compared to YOLOv3 and the major implements include mosaic data augmentation and auto learning bounding box anchors.
References 1. S.V. Viraktamath, Madhuriyavagal, R. Byahathi, Object detection and classifications using YOLOv3. IJERJ J. 10(2) (2021). ISSN: 2278-0181 2. O. Masurekar, O. Jadhav, P. Kulkarni, S. Patil, Real time object detection using YOLOv3. IRJET J. 7(3) (2020). ISSN: 2395-0072 3. L. Zhao, S.y. Li, Object detection algorithm based on improved YOLOv3. MDPE J. 9(537) (2020) 4. Mohana, H.V. Ravish Aradhya, Object detection and tracking using deep learning and artificial intelligence for video surveillance applications. IJACSA J. 10(12) (2019) 5. T.L. Bang, G.T. Nguyen, T. Cao, Object tracking using improved deep sort-YOLOv3 architecture. ICIC J. 14(10), 1881–2803 (2020) 6. A. Vidyavaui, K. Dheeraj, M. Rama Mohan Reddy, K.H. Naveen Kumar, Object detection method based on YOLOv3 using deep learning networks. IJITEE J. 9(1), 2278–3075 (2019) 7. M. Aryal, Object Detection Classification and Tracking for Autonomous Vehicle Master theses. 912, Grand Valley State University, Michigan, USA (2018) 8. T. Han, L. Wang, B. Wen, The kernel based multiple instances learning algorithm for object tracking. MDPI J. 7(6) (2018) 9. J. Kwon, K. Kim, K. Cho, Multi target tracking by enhancing the kernelized & correlation filter based tracker. Electron. Lett. 53(20), 1358–1360 (2017)
10 Object Tracking and Detection Using Convolutional Neural Networks
107
10. S. Thennammai, D. Mahima, D. Saranya, A.C. Sounthararaj, Object tracking using image processing. Int. J. Innov. Res. Technol. 2(4) (2015) 11. Y. Li, J. Zhu, A scale adaptive kernel correlation filter tracker with feature integration, in Computer Vision—ECCV 2014 Workshops, ed. by L. Agapito, M. Bronstein, C. Rother. LNCS, vol. 8926 (Springer, 2014)
Chapter 11
Novel Coronavirus Progression Analysis Using Time Series Forecasting Alagam Padmasree, Talluri Kavya, Kukkadapu Santhoshi, and Konda Srinivasa Reddy
1 Introduction Many deadly epidemics affected human health and the economic condition of the countries and now, the mother earth is facing the most treacherous disease due to Coronavirus. This virus has impacted all the ages and the effect is adverse if the person has some other health issues. This is a contagious disease but simple precautionary measures can be taken to keep this virus at bay. The virus is adapting to the environment and converting itself into different variants. Fierce new COVID-19 waves are enveloping all across the world, placing severe strain on the country’s healthcare systems and prompting appeals for help [1]. According to a survey, 1 in 10 Indians are affected by this syndrome. India is facing the emergence of black fungus, and other fungal co-infections. This is being identified among COVID-19 patients, as the immune system of the infected person weakens due to its attack. People with other health issues are more likely to become the victim of the virus. Analyzing and predicting the future possibilities are the need of the hour. Technological solutions might not prevent the pandemic but would definitely help us in overcoming the vicious trenches of the virus. In this research work, different TSA models are explored for getting an overview of the situation for future vigilance and prognostics.
A. Padmasree · T. Kavya (B) · K. Santhoshi Student, Department of Computer Science and Engineering, BVRIT HYDERABAD College of Engineering for Women, Bachapully, Hyderabad, India e-mail: [email protected] K. Srinivasa Reddy Professor & HOD, Department of Computer Science and Engineering, BVRIT HYDERABAD College of Engineering for Women, Hyderabad Bachapully, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_11
109
110
A. Padmasree et al.
2 Methods Time series analysis refers to the analysis of time-variant data to identify trends displayed by the data over a period of time. The data points may have an internal structure (such as trend or seasonal variation) that might help us in predicting the next period value based on the past value and current value. Different models can be employed to study the data’s characteristics and extract meaningful statistics that might unlock hidden patterns which are useful for future forecasting. In this research work, six different forecasting methods are assessed based on the correctness of the predictions over the confirmed cases, and the best-performing model is selected for forecasting the other cases [2, 3].
2.1 Auto Regressive Model The autoregressive model is a widely used time series forecasting model. In this model, the predictions are made based on past patterns. When there is a relation between the values in the time series data, this model is used for predictions. The autoregressive model assumes that the future will relate to the past.
2.2 Moving Average Model The moving average is also known as the moving process. The overall idea of the MA model is to find the trends in the dataset. Moving averages method uses past forecast errors over the past values. The period is affected by the external factors at various time intervals, and these are called errors.
2.3 ARIMA Model ARIMA stands for Autoregressive Integrated Moving Average. The suite of different standard temporal compositions in time series data is captured in these models which are useful for analyzing and forecasting. A standard notation is used of ARIMA (p, d, q), where p is lag order, q is the order of moving averages and d is the degree of differencing required to make the time series data constant.
11 Novel Coronavirus Progression Analysis …
111
2.4 Holts’ Linear Model In the previous models, we did not consider the trend for the prediction of new values. This method is used when there is an increasing or decreasing trend in the data. This method has a forecast equation, level equation, and trend equation.
2.5 Holt’s Winter Model Holt’s winter seasonal model is the extended Holt’s method for seasonality. This model consists of a three smoothing equations for level, trend, and seasonality and a forecast equation. So, it is called triple exponential smoothing. The season is defined as a fixed amount of time with repetition. This model can be used if there is seasonality in the data.
2.6 Facebook Prophet Model The prophet is an open-source library published by Facebook. It uses an additive model which considers different parameters. It provides us with the ability to make time series predictions with good accuracy using simple intuitive parameters.
3 Dataset 3.1 Data Collection The dataset is taken from COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). This online source is the most authentic source where daily cases are recorded from 22–01–2020 and is updated every day, which includes confirmed, deaths and recovered cases of 276 provinces all across the world. We have considered the data till 02–08–2021 [4].
3.2 Data Preprocessing After data collection, we have done null value analysis and replaced the null values with the mean of that column. The dataset also included the data from 3 cruise ships, namely Grand Princess, Diamond Princess, and MS Zaandam. This data may be
112
A. Padmasree et al.
misleading the actual regional cases as they travel to different places. The preprocessed dataset is partitioned into the ratio of 95:5 for training and testing set, respectively, for future forecasting. This split ratio was chosen as there are fewer data points.
4 Accuracy Metrics All the six models performance was evaluated using three widely used accuracy metrics which include RMSE, MAE, and R2 scores. These metrics give a comparative analysis of the models, which provides great insights regarding their performance. The formulas used are listed below. N 2 i=1 (Predictedi − Actuali ) (1) RMSE = N n abs(yi − λ(xi )) MAE = i=1 (2) n 2 ˆi i yi − y 2 R =1− (3) 2 i (yi − y)
5 Better Performing Model For the analysis purpose, different time series forecasting models are used. All these models are best suited for the data which changes over time. For evaluating the performance of the models and finding the better performing model, three different accuracy metrics viz. RMSE, MAE and R2 score are used. Based on the performance metrics, Facebook prophet is giving better insights about the trend. FB Prophet is a forecasting tool which can be implemented using Python and R. It uses an additive model. The non-linear trends are fit with daily, weekly, and yearly seasonality. Holiday effects are also considered. We use a decomposable time series model with three main model components: trend, seasonality, and holidays. They are combined in the following equation: y(t) = g(t) + s(t) + h(t) + et
(4)
• g(t): piecewise logistic growth or linear growth curve for modeling non-periodic changes • s(t): periodic changes
11 Novel Coronavirus Progression Analysis …
113
Fig. 1 Analysis of the confirmed cases progression using FB prophet model
Fig. 2 Weekly trends of the confirmed cases
• h(t): holiday effects with irregular schedules • εt: error term (Figs. 1 and 2).
6 Result Analysis of the data is done using different visualization tools such as matplotlib, seaborn, sci-kit-learn for interpretation of the trends in confirmed, death, recovered, active and daily cases along with recovery rate and mortality rate [5]. The progress of the cases over time was examined and plotted using different representations. After fitting the models on the training data, testing data was used for estimating the model’s accuracy. Furthermore, predictions of the confirmed cases for the next ten days were also forecasted. A reliable prediction of the pandemic may help different social entities to take precautionary measures until the cure is available. Parameters are optimized for better results. The appraisals were conducted using RMSE, MAE, and R2 score, and the performance was evaluated. Based on the outcomes, the best model was identified and used for forecasting recovered and death cases (Figs. 3, 4 and 5).
114
A. Padmasree et al.
Fig. 3 Worldwide confirmed cases prediction for the next 10 days using all the six models
Fig. 4 Worldwide confirmed, death, recovered, active cases progress over time
7 Conclusion Research on COVID-19 progression is essential to understand the struggles and sufferings of the globe. Many people have lost their lives, and this urges us to foretell the future possibilities. The outcomes of this research work are helpful for understanding the trends and predicting the confirmed cases of the COVID-19 all across the world which is helpful for the public and private healthcare organizations to alleviate this issue. In this work, global data is used. Six models are implemented on this data to forecast the future scenarios [6]. The dataset is updated every day, and
11 Novel Coronavirus Progression Analysis …
115
Fig. 5 Accuracy metrics of the models
we can get the future values of the cases anytime by re-running the code over the new data. The forecastings show a rapid increase in the deaths, recovered, active, and confirmed worldwide. In comparison with all other models, the FB Prophet model has a better performance on the scale of R2 Score, MAE, and RMSE error metrics. In future, the dataset size might increase, and there are possibilities that other models may also emerge with better insights regarding the trends of this vicious infection.
References 1. H. Nishiura, H. Oshitani, T. Kobayashi, T. Saito, T. Sunagawa, T. Matsui, T. Wakita, M. COVID, M. Suzuki, Closed environments facilitate the secondary transmission of coronavirus disease 2019 (covid-19) (2020) 2. G. Shinde et al., Forecasting models for coronavirus disease (COVID-19): a survey of the state-of-the-art. SN Comput. Sci. 1 (2020) 3. G.R. Shinde, A.B. Kalamkar, P.N. Mahalle, N. Dey, J. Chaki, A.E. Hassanien, Forecasting models for coronavirus disease (COVID-19): a survey of the state-of-the-art. SN Comput. Sci. 1(4), 1–15 (2020) 4. S. Maurya, S. Singh, Time series analysis of the COVID-19 datasets, in 2020 IEEE International Conference for Innovation in Technology (INOCON), Bengaluru, India (2020) 5. O. Sarkar, M.F. Ahamed, P. Chowdhury, Forecasting and severity analysis of COVID-19 using machine learning approach with advanced data visualization, in 2020 23rd International Conference on Computer and Information Technology (ICCIT), DHAKA, Bangladesh (2020) 6. V. Kulshreshtha, N.K. Garg, Predicting the new cases of coronavirus [COVID-19] in India by using time series analysis as a machine learning model in Python. J. Inst. Eng. India Ser. B (2021)
Chapter 12
A Modern Approach to Seed Quality Check and it’s Traceability in Agriculture N. Sandeep Chaitanya, Rajitha Bhargavi Movva, and Sagar Yeruva
1 Introduction The cornerstone of all alternative seed technologies is seed checking throughout seed handling, seed testing is employed to observe quality parameters, and test results are sent to customers as seed quality documentation. It suggests that the seed quality may be assessed and seed viability is guaranteed within the business seed cultivation market, in the case of voluminous amounts of seeds, the filtering of individual damaged seeds and foreign components isn’t feasible. Furthermore, in the seed packaging markets, the labor force conducts seed filtering, resulting in caliber seeds in the packets and therefore the waste of a significant range of fine quality seeds. Economical and automatic seed testing are that the most {significant} element of all alternative seed technologies during this context. Tens of thousands of seed heaps have to be compelled to be assessed annually by seed testing facilities. There ought to be some potency and automation within the seed filtering and packaging sectors to create it easy for all seed sectors. Deep learning has developed rapidly in recent years; For example, some search and recommendation engines, voice and image recognition systems have implemented deep learning methods and achieved good results. Through the zoom function, graphics data can be processed in real time. Convolutional Neural Network (CNN) has achieved excellent imaging results. recognize. CNN has significant advantages over traditional machine learning algorithms: when analyzing images and exchanging weights, deep convolutional neural networks are more independent of the image display structure, and local receptive
N. Sandeep Chaitanya · R. B. Movva (B) · S. Yeruva Department of CSE, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India N. Sandeep Chaitanya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_12
117
118
N. Sandeep Chaitanya et al.
fields and down sampling can be used in the network. The data processing system makes CNN efficient. On the other hand, the latest technological developments through the use of blockchain technology can offer a sensible and practicable solution to ensure the traceability of agricultural products and make reliable central authority redundant. Due to its transparency, blockchain technology is popular in the supply chain and logistics community. As well as the immutability of transactions, it increases the trust among relevant stakeholders. Because of its security, reliability, security and traceability to prevent unauthorized access, blockchain can be effectively used in the agricultural field. Food supply chain and agriculture have received a lot of attention in community research, because the long and troublesome supply chain from raw materials to end users makes it extremely difficult and time-consuming to trace the origin of products. Therefore, it is necessary to create a secure foundation for tracking seed sources and security details throughout the supply chain cycle without the need for third parties or centralized control.
2 Related Work This paper proposes AN automatic analysis technique for deciding the standard of rice grains. The system determines the quality of grains supported look characteristics resembling form and color, combined with image process technology and neural network consumption. It is recommended to classify grains with limited characteristics, to overcome shortcomings such as boredom and time [1]. In this work, they proposed digital imaging techniques to check the purity of different seeds. The accuracy of these measurements. The purity test is divided into three parts; inert substances, other seeds and pure seeds, but we think the group used for imaging will not give accurate results because the following seeds are not visible [2]. The information set is recorded online, wherever the feature extraction technology is performed. 3 GLCM operate extractions are used, the red, green and blue color area ways, and therefore, the threshold provided for the bag-of-words model to represent text data and pass it to a few classifiers: ANN, CNN and KNN. Comparison of various sorts of nerves the results of the network. This study shows that these methods are appropriate for seed classification analysis. Improve accuracy and precision, CNN was rated because the best seed sorting neural network [3]. In this paper, functions with inside the pictures couldn’t be extracted with simply photograph evaluation they’ve proposed the automation concept of seed trying out the usage of Image Analysis Software and Edge Detection however still, the dominating software [4]. This article will filter the seeds in clusters instead of filtering out all the damaged seeds and fragments. They created a high-quality data set that included small corn seeds, damaged corn seeds and debris. The system classifies the input image into seed clusters of excellent, good, fair, bad and worst quality. Good and good clusters can be planted or packaged and bad and inferior clusters can be discarded [5].
12 A Modern Approach to Seed Quality Check …
119
In this article, they used 3 classifiers on GoogLeNet to indicate that the accuracy of the network will increase because the depth of the network increases. They additionally combined imaging technology to make a singular map of every network layer in CNN and used heat maps to show the chance distribution of reasoning results. We all know that CNN is considerably higher than machine learning algorithms in evaluating errors in corn seeds, and also the accuracy of the model increases as the depth of the network increases [6]. The purpose of this research is to propose a method based on interactive and traditional machine learning methods to classify soybean seeds and saplings according to their appearance and physiological potential, and to compare the appearance of seeds with their physiological characteristics. Effectively identify and classify damaged seeds. According to the growth level of seedlings, it is recommended to use LDA, RF and SVM algorithms to classify soybean seeds and seedlings based on the data generated by the Ilastik software [7]. This paper proposes a new method that uses a convolutional neural network to automatically classify seeds as good or bad based on their visual characteristics. The data set used to train the model consists of images of the top and bottom seed profiles. Investigate the use of hardware solutions. The seeds are classified according to the CNN model. The productivity of this device is significantly improved because it scans two seed profiles instead of one [8]. The purpose of this article is to review the concepts of supply chain management and traceability in agriculture, and to highlight the technical challenges associated with the implementation of traceable supply chains in agriculture. The integration of visual agriculture and animal husbandry has great potential to improve the speed and accuracy of knowledge-based agricultural traceability [9]. This article studies the use of RFID in the quality management of agricultural seeds, covering everything from seed planting, storage and transportation to the quality control of the control and monitoring department. The last end user, farmers’ quality assurance and plant management, also discussed several key technologies for using RFID in the seed quality monitoring system, such as functional design, label type selection, frequency selection, protocol selection, data security design, and protection collision technology planning [10]. In this article, we felt hat current traceability practices in the agricultural supply chain are severely affected by data fragmentation and centralized control and are vulnerable to data changes and data control. If contamination occurs, determine the source and quickly isolate the product from the supply chain. Close coordination between multiple participants in the agricultural supply chain is required. The various steps in the food supply chain are usually well followed, but the exchange of information between steps is difficult and time-consuming. [11]. This document provides complete information on traceability related to the safety and quality of the food supply chain. Monitoring the development of agricultural products and effectively managing the logistics of agricultural products and the food supply chain are essential to ensure product safety. Traceability throughout the supply chain [12]. In this work, the proposed hybrid traceability system links the plant supply chain, processing, procurement, production, logistics and distribution, and will help farmers to scientifically monitor agricultural production based on knowledge and
120
N. Sandeep Chaitanya et al.
climate, as well as transparency in agricultural logistics and crop use. Sowing to harvest, agricultural product processing, transportation and sales are all recorded in the database in real time for future management and control [13]. This article focuses on how to use new blockchain technology to improve food logistics through the Internet of Things, and how blockchain can benefit end users, which is essential for the widespread use of blockchain. Food logistics standards related to collection, display, storage and access control [14]. In that article, they proposed a method using the Ethereum blockchain and smart contracts, which can effectively run commercial transactions across the entire agricultural supply chain is to track and trace soybeans, and proposed a solution, Which solves the need for concentration and credibility, and makes authority redundant, intermediary, providing transaction logs to improve performance and protection with high integrity, reliability and security. The proposed solution is based on the use of smart contracts to monitor and regulate all interactions [15]. This article introduces the methods and applications of blockchain technology in agriculture. On the one hand, it explains technical elements such as data structure, cryptographic process and consensus mechanism, and on the other hand, it explains the existing agricultural blockchain. And analyze the application to demonstrate the use of blockchain technology. It also has popular platforms and smart contracts to show how professionals can use them to develop these agricultural applications. Third, we identified many key issues in the emerging agricultural system and discussed these efforts to solve these problems [16]. In this paper, all transactions on the blockchain are recorded; a recommended method for ultimately uploading data to the Interplanetary File Storage System (IPFS). The storage method returns the hash value of the data stored on the blockchain to ensure that the solution is efficient, safe and accurate. Show the human participation system and its algorithms. In addition, this work also provides smart contract modeling and evaluation, as well as security and vulnerability analysis [17]. Along with the summary of the characteristics and advantages of blockchain technology, this article introduces the concept, acceptance, tools and benefits of traceability, and then systematically reviews the literature that combines blockchain and system traceability and continues to explore the current problem. Commercial implementation discussed the current problems and potential opportunities of blockchain implementation [18]. In this document, the proposed solution is based entirely on the benefit of smart contracts between all relevant parties to track and control all interactions and transactions in the supply chain network. Enable stakeholders to ensure a safe and profitable supply chain system [19]. In this article, they proposed blockchainbased supply chain management for an e-commerce system, which gives an open forum for manufacturers and consumers to handle fair prices for their products. Suppliers that exclude dealers and brokers can allow direct transfers from mobile devices payment. By implementing the proposed model, new technologies such as blockchain will reduce the inconsistency between the cornerstone of the traditional food supply chain and the safe and efficient supply chain [20].
12 A Modern Approach to Seed Quality Check …
121
3 Proposed System for Seed Quality Check The paper proposes the system to detect the quality of a seed lot using Convolutional Neural Network (CNN). CNN is an advanced version of a deep neural network, with a spectral layer to study lower and higher-level attributes. CNN is an effective model for predictive statistics, modeling, etc. It only contains three additional concepts; H. Partial filters, maximum grouping and weighting. Sharing makes it more powerful than DNN. The CNN architecture used to predict heart disease is shown in Fig. 1. CNN consists of many pairs of largest clusters and convolutional layers. The grouping layer follows the convolutional layer. Frequency Range. Maximum grouping provides good results for the variability problem. The maximum grouping level requires the maximum activation of the filter from various positions in the specified window. In this step, you will generate a minimal resolution version of the convolution feature. The largest grouping layer makes the architecture more tolerant of the subtle differences in the position of each part of the object and leads to faster convergence. Finally, the fully connected layer combines the inputs of all elements. The position in the one-dimensional feature vector. Then, use the soft maximum trigger function layer to classify common inputs. The proposed system for seed quality check developed in four stages: Dataset Preparation, Image Pre-processing, Building Convolutional Neural Network, and Compiling and Training CNN.
3.1 Dataset Preparation We used three types of data sets: soybean, wheat, and corn, and collected approximately 1000 images for each category. To classify seeds into good and bad according Fig. 1 Data flow diagram of seed quality check
122
N. Sandeep Chaitanya et al.
to physical characteristics and extraction characteristics, create a data set, write algorithms and optimize CNN code. A set of training and test images will be used as a data set for good and bad seed projects. The images in the data set consist of upper and lower source profiles, which are placed at different points in the device structure with a black background. A total of 500 excellent seeds and 500 bad seeds were used to establish image data collection for each seed variety.
3.2 Image Pre-Processing Before inputting the data set into the neural network, it must be analyzed. With the help of the library functions provided by Keras, we expand the data set to process all images with a single resolution of 612 * 612. Increasing the input shape above, this resolution can provide higher accuracy but will slow it down. This permission retains the characteristics of corrupted seeds and makes the device easier to understand. Change zoom, cut, zoom, change width and height, rotation, horizontal and vertical flip are the parameters to change. After the data set is scheduled, the images have the same attributes and can be transmitted to the convolutional neural network.
3.3 Building Convolutional Neural Network It was a huge challenge to create a neural network because we need to ensure that the information of the damaged and fine seeds are not lost in any of the layers. So, we designed our CNN, preserving the high resolution of the input images.
3.4 Compiling and Training CNN The CNN architecture compilation is obtained using the following parameters: • Loss Function: In each iteration of the training and validation process, there will be some non-negligible losses, because it is the key point to determine the difference between the actual value and the predicted value. • Optimizer and Evaluation Metric: The training of the neural network would be evaluated with certain metric to increase the system performance. In ordered to evaluate our CNN model, we have chosen accuracy metric. After running for 25 epochs, finally the model ensued 92.48% accuracy.
12 A Modern Approach to Seed Quality Check …
123
4 Smart Seed Traceability System Because of the transparency and immutability of transactions, blockchain technology has grown in popularity among the supply chain and logistics community, enhancing confidence among participating stakeholders. Blockchain can be used effectively in agriculture because it is tamperproof, trustworthy, safe and traceable. Each block is hashed and connected to the next, forming an immutable and tamper-proof chain of records. The smart tracing system proposed a method to the process of filtering spurious seeds is complete, the distributor adds another link to the chain by obtaining the seeds and applying a QR code to the package by the producer itself as a tracking method before it hits the retailers and reaches to farmer. Once the farmer is going to purchase the seed, he should be able to scan the QR code which provides certain information like origin of the seed, expiry seed, quality report, seed brand and company name. The characters involved in the whole process are: • Seed Company: A seed company is an organization that produces a variety of seeds. Each seed has a QR code on the top of the package, indicating the quality of the seeds processed by each batch of products sold to a specific farmer. It is a powerful ally in food security because it makes it easier for farmers to obtain planting materials in the form of seeds, fertilizers and other nutrients that support agricultural production. • Distributor: A distributor is usually buying purified seeds from a producer, create a block and sells the seeds to the retailer. • Retailer: Retailers purchase a limited number of seeds from distributors, usually in batches with traceable identifiers. For example, retailers can buy in bulk and sell them to consumers in smaller units. Standard identifiers are hierarchically linked and therefore manageable. • Farmer: Farmers buy seeds from seed companies, and their standard identifiers are associated with seed batches and companies involved in the sale transaction, origin of the seed, quality and creates the smart contract by scanning QR code.
4.1 Seed Tracing Architecture See the Fig. 2.
4.2 Advantages of Traceability System Using Blockchain and QR Code Traceability creates an information chain that provides information on seed safety, seed processing, seed sales, and origin information. The blockchain system makes the supply chain transparent and open; therefore, further observation becomes easy. With
124
N. Sandeep Chaitanya et al.
Fig. 2 Architecture of smart tracing system
the support of system monitoring and transparency, defective products can easily enter the supply chain. Counterfeit products can be detected. You can use QR codes to track products. No manual operation is required to avoid human error. Blockchain members cannot change the data, thereby improving the safety and quality of the product.
4.3 Smart Tracing System Algorithms See the Figs. 3, 4 and 5.
5 Results We took 3 varieties of seeds with around 1000 images per each variety. We got varying results which finally overcome the issue of overfitting and underfitting. The results are demonstrated in the following Fig. 6, 7, 8 and 9.
12 A Modern Approach to Seed Quality Check …
Fig. 3 Algorithm 1
Fig. 4 Algorithm 2
125
126
N. Sandeep Chaitanya et al.
Fig. 5 Algorithm 3
Fig. 6 Accuracy results for seed quality system
6 Conclusion and Future Work We have proposed a solution for seed quality check and tracing them securely till reaches to farmer without involvement of fraudsters. We used convolutional neural network with the increased parameters to achieve best results to test the quality of the seeds. Blockchain technology can reduce the traditional seed supply method’s vulnerable cornerstone into a secure, trustworthy and proof of delivery approach through implementing our proposed model. We have presented details and aspects related to the system architecture, design, implementation algorithms and results. It
12 A Modern Approach to Seed Quality Check … Fig. 7 Training and validation loss analysis
Fig. 8 Training and validation accuracy
Fig. 9 Sequence diagram for smart seed tracing system
127
128
N. Sandeep Chaitanya et al.
is very important for farmers to understand and trace the related information of agricultural seeds before the purchase at any time and put forward an effective solution for the agriculture seed safety management. The future work for quality check is to implement and coverage of a wide variety of seeds and training for higher number of epochs.
References 1. Mr. V.S. Kolkur, Ms. B.N. Shaikh, Identification and quality testing of rice grains using image processing and neural network. Int. J. Recent Trends Eng. Res. (2016) 2. Mr. V.S. Kolkur, Ms. B.N. Shaikh, Digital image processing applied to seed purity test. Int. J. Innovative Res. Electr. Electron. Instrum. Control Eng. (2017) 3. M. Ranjith Bose, K. Ranjith, S. Prakash, S. Kumar Singh, Dr Y. Vishwanath, Intelligent approach for classification of grain crop seeds using machine learning. IRJET (2018) 4. S. Sharma, V.S. Mor, A. Bhuker, Image analysis: a modern approach to seed quality testing. Curr. J. Appl. Sci. Technol. (2018) 5. M. Bhurtel, J. Shrestha, N. Lama, S. Bhattarai, A. Uprety, M.K. Guragain, Deep learning based seed quality tester. Sci. Eng. Technol. (SET) Conf. (2019) 6. S. Huang, X. Fan, L. Sun, Y. Shen, X. Suo, Research on classification method of maize seed defect based on machine vision. Hindawi Res. Article (2019) 7. A.D. de Medeiros, N.P. Capobiango1, J.M. da Silva, L.J. da Silva, C.B. da Silva, D.C.F. dos Santos Dias, Interactive machine learning for soybean seed and seedling quality classification, Sci. Reports (2020) 8. S.K. Hiremath, S. Suresh, S. Kale, R. Ranjana, Dr. K.V. Suma, Dr. N. Nethra, Seed segregation using deep learning. IEEE (2020) 9. L.U. Opara, Traceability in agriculture and food supply chain: a review of basic concepts, technological implications, and future prospects. Food, Agric. Environ. 1(1) (2003) 10. H. Li, H.J. Wang, Z. Shang, Q.H. Li, W. Xiao, Application of RFID in agricultural seed quality tracking system, in World Congress on Intelligent Control and Automation (2010) 11. J. Storøy, M. Thakur, P. Olsen, The Trace food framework—principles and guidelines for implementing traceability in food value chains. J. Food Eng. (2012) 12. M.M. Aung, Y.S. Chang, Traceability in a food supply chain: safety and quality perspectives. Food Control Article (2014) 13. J. Wang, J.-M. Wang, Y.-J. Zhang, Agricultural product quality traceability system based on the hybrid mode, in 4th Annual International Conference on Network and Information Systems for Computers (2018) 14. A. Pal, K. Kant, Using Blockchain for Provenance and Traceability in Internet of ThingsIntegrated Food Logistics (Temple University, 2019) 15. N. Nizamuddin, M. Omar, K. Salah, R. Jayaraman, Blockchain-based soybean traceability in agricultural supply chain. IEEE (2019) 16. W. Lin, X. Huang, H. Fang, V. Wang, Y. Hua, J. Wang, H. Yin, D. Yi, L. Yau, Blockchain technology in current agricultural systems: from techniques to application. IEEE August 5, 2020 17. S. Al-Amin, S.R. Sharkar, M. Shamim Kaiser, M. Biswas, Blockchain-based Agri-food supply chain: a complete solution, IEEE, April 7, 2020 18. K. Demestichas, N. Peppes, T. Alexakis, E. Adamopoulou, Blockchain in agriculture traceability systems: a review. MDPI J. (2020)
12 A Modern Approach to Seed Quality Check …
129
19. D. Prashar, N. Jha, S. Jha, Y. Lee, G.P. Joshi, Blockchain based traceability and visibility for agricultural products: a decentralized way of ensuring food safety in India. MDPI Article (2020) 20. S. Al-Amin, S.R. Sharkar, M. Shamim Kaiser, M. Biswas, Towards a blockchain-based supply chain management for E-agro business system, in Proceedings of International Conference on Trends in Computational and Cognitive Engineering, Advances in Intelligent Systems and Computing (2020)
Chapter 13
Experimental Face Recognition System Using Deep Learning Approaches Nsikak Imoh , Narasimha Rao Vajjhala , and Sandip Rakshit
1 Introduction Face detection and recognition is a procedure to identification that involves two core approaches: first, determining if a given face is indeed a human face, and the face is in a given image or video, secondly, distinguishing the face from all other faces [1]. There are different ways currently in existence that one can approach the development of face detection and recognition system. One approach requires collecting and maintaining a database or more concretely a “knowledge base” that contains “a set of predetermined rules about the face features of humans.” Despite its enthralling capability, this approach is severely hindered by various head poses and movements during its use [2]. A different approach to consider is one that selects a facial feature such as the eye, collects distinctive information, and uses it to determine if the given object is a human face. This approach is also hindered by factors such as poor image resolution, use of obstructive eye gears, and humans with “eyesshut” blindness [2]. It is worth mentioning that the majority of the face recognition systems in place use either two-part patterns that are natively calibrated or artificial intelligence with machine learning [3]. The issue with the former is that it does not take into account the varying facial features in live face images such as when a female subject uses facial cosmetics while the latter completely disregards local facial features, which are needed in building up recognition [3]. This experimental N. Imoh · S. Rakshit American University of Nigeria, Yola, Nigeria e-mail: [email protected] S. Rakshit e-mail: [email protected] N. R. Vajjhala (B) University of New York Tirana, Tirana, Albania e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_13
131
132
N. Imoh et al.
face recognition system tends to bridge the gap between these two approaches by implementing facial detection using the “Viola–Jones object detection framework” and face recognition using deep learning algorithms [3, 4].
2 Background and Review of Literature 2.1 Problem Statement As with most things in life, there is no one-size-fits-all algorithm to handle all the face recognition needs. For this reason, most general-purpose face recognition systems in use today seem to be lacking or missing important functionality. While some are really good at detecting faces, they fall short of adequately recognizing the faces, and the ones that have impeccable recognition struggle with detecting the faces in all given conditions. For an ideal face detection and recognition system to work in combating security and information management issues, the system should seamlessly integrate a foolproof face detection algorithm with a reliable intelligent-based recognition algorithm [1]. To develop foolproof face detection and recognition functionalities needed for security and information management, the system firstly should have the ability to separate a human face from a non-human face, such as that which is made by a sculptor or painting, from an original image input [5]. Secondly, it should be able to detect the subject’s face at all times including different variations due to facial expressions, moods, cosmetic procedures, and distress [5]. Thirdly, it should be able to distinguish the original subject of recognition from impostors or impersonators such as an identical sibling, twin, life-sized image, or impersonator’s mask [3]. Lastly, it should be able to determine if the system is being accessed by the original subject and not someone else impersonating the original subject [3]. If appropriately implemented, these features can go a long way to help solidify the development of a face recognition system.
2.2 Past Face Recognition Approaches There are three common ways that traditional systems approach face recognition to get the appropriate facial calibration. These approaches, which include “face component detection, feature extraction, and expression classification”, customarily bring into play the main components of a human face [4]. It achieves this by taking a given facial input, calibrating it, and subsequently, calculating and comparing the extent of similarity between the facial inputs to another that exists in a database. However, looking back at the recognition systems prior to the current traditional system, in the early 1960s, face recognition functionality was made to work through
13 Experimental Face Recognition System …
133
“a semi-automated system” [6]. To make it work, certain features of the subject’s image were identified and labeled to point out standard facial features such as the eyes, ears, nose, lips, and brows [6]. Thereafter, these labeled features were used to mathematically measure the facial features using computations such as the size of each labeled feature and the distance between one labeled features to another [6]. In the early 70s, the system of labeling facial features and semi-automated machines was redeveloped by Goldstein et al. [7]. The new design included adding more features to be calibrated, specifically, “21 subjective markers such as the thickness of the lip and hair color” [7]. It was an intuitive approach that involved a majority of the mathematical calibration of the facial features done manually, and as a result, it made automating the recognition process burdensome [7]. In a bit “reduce the subjectivity of the previous process and measure the features,” Goldstein et al. planned to create a conventional template of the facial features to be used as a standard for measurement [7]. However, following extensive research on the procedure, it was discovered that the system was flawed because the calibrated features “did not contain enough data to represent an adult face” [7]. The first fully automated recognition system developed in the late 70s by “Kanade” made use of vector representation of the features on the subject’s face [8]. Afterward, a new approach, known as the principal component analysis (PCA), was developed by Sirovich and Kirby [9]. The idea of the PCA which is the trail-blazer for eigenfaces approached face recognition with the theory that PCA can form the basic facial features on their own and store them as a base template [9]. Then, the base template “can be linked together linearly to recreate pictures that make up the original training set” [10]. This idea was built upon by Turk and Pentland when they “gave a practical illustration for extracting the eigenvectors based on matrices sized by the number of images rather than the number of pixels” [11]. Furthermore, Turk and Pentland showcased “a new method of computing the eigenvectors so that computers of the time can easily carry out the process of decomposition on a large training set of face images” [11]. With the discoveries made from their research experiment, they built the first fully automated face recognition system that “demonstrated the eigenface concept” [10]. The eigenface approach became the stepping stone of modern face recognition technologies, and it is being built over and over again to improve its accuracy and how images are being preprocessed [10].
2.3 Current Face Recognition Approaches The current approaches to face recognition are classified into the statistical, holistic, feature, and artificial intelligence as shown in Fig. 1. The statistical approach carries out face recognition by calibrating and weighing up the “database images’ density value” [12]. This approach tends to be problematic due to the high cost of calculating the value of image density “due to the usual gap pathways such as face orientation scaling, pose anomalies, and illumination issues” [12].
134
N. Imoh et al.
Fig. 1 Previous and current classification of face recognition approaches
In the holistic approach, the entire face is taken as a single feature rather than breaking it into individual features like the eyes, nose, and mouth [13]. Here, facial recognition works by taking the entire face as an input with a singular feature. By combining the “gray values of all pixels in the face, a signal high-dimensional vector” is formed that serves to represent the face image [13]. In a feature-based approach, rather than taking the entire face as a single unit and single input, it “carries out recognition using specific features of the face such as eyes, nose, mouth, mole, and ears” [14]. Afterward, the representation of these facial features is compared to its representation in a data store, and each feature is compared specifically to the picture that exists in the data store [14]. In the artificial intelligence approach to face recognition, “tools such as neural networks and deep learning” are used to pick out faces by means of a “supervised training of the system” [15]. To implement this, the face pattern is trained using a method called “supervised learning” and stored “in the system as a model” [15]. When face recognition is initiated, the recognition system takes a facial input and compares it to the model that was previously trained and stored [15]. Then, the output is given based on the degree of accuracy [15].
3 Experimental Face Recognition System 3.1 Experimental Approach to Face Recognition As stated earlier, this experimental face recognition system approaches face recognition in two main phases: face detection and face recognition. These two phases will be achieved by implementing facial detection using the “Viola–Jones object detection framework” and face recognition using deep learning algorithms [3, 4]. The “Viola–Jones object detection framework” is chosen as a result of its incorporation of “the concepts of Haar-like features, integral Images, the AdaBoost algorithm, and the cascade classifier” to develop a faster and more reliable system for detecting a face on a given image or video [4]. This is the major inspiration behind most face detection systems used in services such as Instagram and Snapchat filters.
13 Experimental Face Recognition System …
135
Face recognition powered by deep learning approaches face recognition tasks by forming a deeper understanding of the subject’s facial features using a direct end-toend image interplay, which significantly reduces the image pre-processors and other precalibrated mathematical-based recognition models [5]. There will be three main actors when the system is used for security and information: the administrator, information manager, and security manager. The administrator will be granted full access to the entirety of the system and designated the role of collecting, training, and maintaining a database of human faces. This will serve as the base for which new image inputs will be compared. The security manager and the information manager will be on the field to carry out the process of identification, verification, and validation of a face or group of faces. The two main phases of this experimental face recognition system are further divided into four key steps to developing a computer system, which include detecting the face, normalizing the face, extracting the required features of the face, and recognizing the face. In the first step, face detection is implemented using Haar Cascade to analyze the pixels of a given image. This can be done with any technology that appropriately implements the Haar classifier technique. In this experimental system, we will use the Haar Cascade Classifier on an Open-Source Computer Vision Library (OpenCV). Haar classifiers are used not only because it is the primary classifier that drives the Viola–Jones detection framework, but because Haar Cascade Classifier “uses an integral image concept to compute the features detected on the face to perform feature extraction by applying the AdaBoost learning algorithm, which selects a small number of essential elements from a large set to give an efficient result of classifiers” [16]. Additionally, its seamless integration in the Viola–Jones detection framework helps achieve a much faster image processing time that significantly boosts the ratio of detection during the face detection process. Also, because they have been developed, tested, and trained to detect faces ranging from thousands to millions of images, the rate of reliability and accuracy of the output can be trusted for development. When the face detection step is fully implemented, there will be a blue square in front of the detected face to serve as a visual indicator. The next step involves normalizing the face to handle noise and other falsies that tend to arise with computer vision technology. This process involves configuring the image input to get a steady alignment. Afterward, the issues of illumination will be highlighted to deal with poorly-lit areas. This feature will heavily rely on the camera hardware being used for image detection. If the camera is capable of handling night mode, then the system will enhance the given image to fit the detection criteria. Extracting specific features from the face is the next step to be carried out in the facial recognition system. This is done first during the training of the model and afterward during a live recognition. Certain features of the face are extracted using characteristics known as “Haar features”. These Haar features take into account the common facial features that are found on most human faces such as the differences in the proximity of the nose bridge from the forehead, the eyes, and cheeks, and how the shade of the skin compares from one aspect to another (Fig. 2).
136
N. Imoh et al.
Fig. 2 Haar features used by the feature extraction process
To adequately normalize the image and extract features, this procedure has used some open-source technology such as Scikit-learn, NumPy, and PyTorch. The final step that this experimental system implements is face recognition. This involves collecting a subject’s image through a live video feed, generating 128measurement criteria, and comparing it against other images in a pre-trained database to look for a match. This is where the verification and identification of the subject or group of subjects take place. To kick off the process of the experimental face recognition, we took advantage of a user-friendly classifier of a linear support vector machine (SVM). The SVM classifier collects the calibrations of image input and compares it to a database to get the closest match all in a few milliseconds (Fig. 3). The pre-training process of this facial recognition system is done by analyzing three distinct images of a face. Two images will be from the face of a known subject, while one image will be the face of an unknown subject. The training process works
Fig. 3 128 measurements generated from a sample image
13 Experimental Face Recognition System …
137
Fig. 4 Pre-training process of experimental face recognition system
by first inputting the face of a known subject, secondly, we input another face image of the same known subject, and lastly, we input a face image of an unknown subject. With this, the recognition algorithm analyzes the mathematical representation from the calibrations of each image input. Then, the system works on adjusting the similarity between the first and second image such that their calibrations are alike, and with that information, it distinguishes the third image. Doing this procedure for different amounts of people causes the neural network to reliably and accurately learn to extract the needed “128 measurements” of each subject to be identified. This means that inputting more images of a particular will generate roughly the same “128 measurements (Figs. 4 and 5).”
3.2 Results from Test Step one: We detect the mouth (Fig. 6). For each eye detected, draw a rectangle around it: Step two: We detect the eyes (Figs. 7 and 8).
138
N. Imoh et al.
Fig. 5 Overview of steps of processing in a facial recognition system
Fig. 6 Representation of a detected smile/mouth
Fig. 7 Representation of detected eyes
4 Recommendations for Future Research Future work on improving this experimental system could be to include an additional means of verification. For instance, in situations where facial recognition is challenging or needs a secondary means of identification for validation such as in the case of a very identical twin. We can also look into using a convoluted neural network
13 Experimental Face Recognition System …
139
Fig. 8 Final result of the face recognition system
to implement face detection. This can be easily carried out using the multi-task convoluted neural network (MTCNN) package in Python.
5 Conclusion Face recognition systems can help to moderate the spike in challenges to security as well as information and resource management. However, a simple face recognition system may not be enough to tackle the challenges. A better approach might be combining the best features of the existing technologies such as foolproof verification and validation, mass identification, and instant recognition into a singular system. To develop foolproof face detection and recognition, the system should have the ability to separate a human face from a non-human face, detect the subject’s face at all times including different variations due to facial expressions, moods, cosmetic procedure, and distress, distinguish the original subject of recognition from impostors or impersonators, and determine if the system is being accessed by the original subject and not someone else impersonating the original subject. This experimental approach implements face recognition in two phases: face detection and face recognition. The two main phases of this experimental face recognition system are further divided into four key steps to develop a computer system, which include detecting the face,
140
N. Imoh et al.
normalizing the face, extracting the required features of the face, and recognizing the face.
References 1. K. Sharma, P.K. Dahiya, A state-of-the-art real-time face detection, tracing and recognition system. IUP J. Telecommun. 10(4), 51–61 (2018) 2. S. Kumar, S. Singh, J. Kumar, Automatic live facial expression detection using genetic algorithm with Haar wavelet features and SVM. Wireless Pers. Commun. 103(3), 2435–2453 (2018) 3. A.S. Al-Waisy, R. Qahwaji, S. Ipson, S. Al-Fahdawi, A multimodal deep learning framework using local feature representations for face recognition. Mach. Vis. Appl. 29(1), 35–54 (2018) 4. M. Castrillón, O. Déniz, D. Hernández, J. Lorenzo, A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework. Mach. Vis. Appl. 22(3), 481–494 (2011) 5. B.C. Ko, A brief review of facial emotion recognition based on visual information. Sensors (Basel, Switzerland) 18(2) (2018) 6. S. Carey, R. Diamond, B. Woods, Development of face recognition: a maturational component? Dev. Psychol. 16(4), 257–269 (1980) 7. A.J. Goldstein, L.D. Harmon, A.B. Lesk, Identification of human faces. Proc. IEEE 59(5), 748–760 (1971) 8. A. Samal, P.A. Iyengar, Automatic recognition and analysis of human faces and facial expressions: a survey. Pattern Recogn. 25(1), 65–77 (1992) 9. L. Sirovich, M. Kirby, Low-dimensional procedure for the characterization of human faces. J. Opt. Soc. America A 4(3), 519–524 (1987) 10. A. Pentland, T. Choudhury, Face recognition for smart environments. Computer 33(2), 50–55 (2000) 11. M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, 586–591 12. F. Murtagh, P. Contreras, Algorithms for hierarchical clustering: an overview, II. WIREs Data Min. Knowl. Disc. 7(6), e1219 (2017) 13. M.P. Beham, S.M.M. Roomi, A review of face recognition methods. Int. J. Pattern Recogn. Artif. Intell. 27(04), 1356005 (2013) 14. H.P. Truong, Y. Kim, Enhanced line local binary patterns (EL-LBP): an efficient image representation for face recognition 15. H. Bae, S. Kim, Real-time face detection and recognition using hybrid-information extracted from face space and facial features. Image Vis. Comput. 23, 1181–1191 (2005) 16. A. Mahmood, S. Hussain, K. Iqbal, W.S. Elkilani, Recognition of facial expressions under varying conditions using dual-feature fusion. Math. Probl. Eng. 1–12 (2019)
Chapter 14
Customer-Centric E-commerce Implementing Artificial Intelligence for Better Sales and Service Salu George Thandekkattu
and M. Kalaiarasi
1 Introduction Artificial intelligence is a way of making the computer work as an intelligent human would do things. A computer-controlled robot is nothing but a software that thinks intelligently in the same way as an intelligent human think. The methodologies are how human’s brain thinks, learns, decides and works while trying to solve a problem and then using the outcomes of this study as a basis of developing intelligent software and systems. Various techniques of AI like chatbots, etc., are major research implementation tools for handling customer data, image search, inventory management, recommendation systems, cyber security, better decision-making, after sales services, customer relationship management and sales improvement, in e-commerce [1]. Chatbots are simply automated programs that can help you in guiding or assisting your tasks like ordering food, guiding you in recruitment portals and also help you in business suggestions like B2B, B2C and many more. We use terms chatbots, automation, virtual assistant and AI interchangeably. Artificial intelligence is expanding technology along with the advancements in machine learning and deep learning to achieve most powerful and smart world. The simplest change in thought by mathematician Alan Turing that “Can machine think?” made a history of World War-II. His Nazi encryption machine enigma helped to win World War-II upon Allied forces. S. G. Thandekkattu (B) American University of Nigeria, Yola, Nigeria e-mail: [email protected] Present Address: M. Kalaiarasi Institute of Aeronautical Engineering, Hyderabad, India e-mail: [email protected] URL: https://www.iare.ac.in/ © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_14
141
142
S. G. Thandekkattu and M. Kalaiarasi
Artificial intelligence is developing intelligent machines, thinking and working exactly like a human. Imagine yourself twenty years ahead where you find machines helping you in serving coffee, buying lunch and working in office with you. Research is mainly focused on speech recognition, problem-solving, learning and planning. Large amounts of data are processed for recognizing patterns from the data gathered. The US National Science and Technology Council published a report on AI in October 2016. AI was published many decades ago, but this technique has become predominant technique in use now. AI is already available in your phone, and keypad marks red line for the words spelled wrongly. Another application is Google search engine; the search phrase typed in the text input automatically fetches relevant phrases based on search pattern. So, auto search engine can also be demonstrated as intelligent agents. Any device that perceives its surroundings to achieve its goals successfully is termed as AI. It analyses the environment to all possible extremities. “Cognitive” functions are associated with human brains such as “learning and “problem-solving”. This is applied on machines and is termed as AI. AI is one branch where again data is the subunit, and most important thing is to proceed further with predictive processing. In order to achieve AI, we need data. AI models need most powerful and efficient processor to solve the most complex algorithms to draw, recognize and replicate patterns [2, 3]. DATA > ALGORITHM > QUICK LEARNER Use AI to solve business problems. Vision: Algorithms are used to smartly identify, caption and moderate your picture called image processing technology. Knowledge: Tasks such as intelligent recommendations and semantic search need to map complex information and data. Language: In order to process natural language with pre-built scripts, evaluate context and recognize the intention of uses. Speech: Speech to text and text to speech using voice verification, which is through speaker recognition in apps. Search: Bing Search APIs in apps will help harness the capability to manage billions of webpages, images and other media means such as news with a single API call. AI in Automation Testing AI expert realizes the work to be automated like the existing customer acquisition processes. It will help the human workers to reduce their manual efforts by hacky automation code. Some sort of prediction and decision-making are done to achieve better results. Watson is an AI platform from IBM for business. Decision-making is made easier. It is the most powerful AI technique. It is powered by the latest innovations in machine
14 Customer-Centric E-commerce Implementing Artificial Intelligence …
143
learning. It is a multi-cloud platform that puts up AI tools to your data whether it is on IBM cloud or own private cloud platform or other specialized cloud (AWS, Google and more). It can automate the AI life cycle, build powerful models from scratch or speed time-to-value with pre-built solution for any enterprise applications. AI leverages MI with the third-party library. AI focuses on increase in productivity. AI is serving for customer satisfaction. AI is used for financial management in accounting.
2 Related Works Artificial intelligence can help online retailers personalize and customize their Websites as well as provide personalized recommendations to their visitors. This would make the purchasing process more engaging and increase the likelihood of a large number of purchases. AI is revolutionizing conversations and content to provide an interactive experience over the Web. Amazon deploys artificial intelligence to assist users with voice shopping and purchasing recommendations, among other things. Face recognition, home price estimation, visual search, autonomous driving and other applications employ Amazon’s AI technology. eBay’s AI platform, Krylov, has offered the corporation a wide range of new capabilities, ranging from improved language translation services to image searches, in order to keep up with the competition, in modern E-commerce. The Hanguang 800 is Alibaba’s first chip to power artificial intelligence (AI) processes. According to the E-commerce giant, it can reduce computer processes that would take an hour to five minutes. Netflix uses machine learning and algorithms to help viewers overcome preconceived preconceptions and discovers shows they might not have picked otherwise. Instead of depending on broad categories to make predictions, it looks at intricate threads within the text. It is prevailing for E-commerce market of the retail industry to implement AI models. The research aims towards how artificial intelligence (AI) is transforming the way online stores work and serve their customers [4–7]. Research Aim: 1. 2. 3. 4. 5.
Why online shopping needs to be interactive UI? Customer-centric online stores, to increase existing customer value. Customers who are comfortable with offline shopping will feel the same way, while shopping online also. How to increase customer’s product purchase and serve his needs. Customer satisfaction is met so that they revert back to the store. How to compete with other E-commerce stores to increase the sales?
144
S. G. Thandekkattu and M. Kalaiarasi
The survey is towards AI model which is implemented in various E-commerce applications. A.
Chatbot
The increase in popularity of voice-controlled devices (VCDs), such as Google Home, Amazon Alexa and others, has resulted in the automation of home appliances, mobile devices and next-generation cars among other items. Audio replay attacks are, however, vulnerable to VCDs and voice-activated services such as chatbots. Our vulnerability analysis of VCDs reveals that in multi-hop scenarios, these replays could be used to cause damage. Our vulnerability review of VCDs demonstrates that these replays could be used to gain malicious access to devices/nodes connected to the Internet of Things in multi-hop scenarios. To secure these VCDs and voiceactivated services, effective and computationally efficient solutions to detect replay attacks are urgently needed. Replay attacks are modelled as a nonlinear mechanism that incorporates higherorder harmonic distortions in this paper. We propose the acoustic ternary patternsgammatone cepstral coefficient (ATP-GTCC) features to detect these harmonic distortions, which are capable of capturing distortions caused by replay attacks. The proposed framework’s output is assessed using the ASVspoof 2019 dataset as well as our own voice spoofing detection corpus (VSDC), which consists of genuine first-order replay (replayed once) and second-order replay (replayed twice) audio recordings. The proposed audio replay detection system accurately detects both firstand second-order replay attacks, according to experimental findings [8–12]. B.
Intelligent Systems
The literature suggests a number of improvements to the manufacturing process in order to create intelligent goods. The literature compares the production of intelligent products to the development of conventional products, pointing out variations in design, consumer relationships and the variety of services provided. All of the above distinctions, as well as other variants, should be considered. As a result, the aim of this research is to make recommendations for the creation of intelligent products that integrate IoT technologies. In addition, as a theoretical contribution to this analysis, we aim to collect and systematize IoT PDP propositions from various areas of the literature, such as computing, engineering, management, finance and others. The Internet of Things (IoT) is a modern technological paradigm that consists of a network that links devices or “things”. All objects that IoT touches gain a virtual identity and connect with users, community and the world through the Internet. As a result, integrating IoT technology into products allows for much new functionality converting them into a new category known as IoT. These products produce data that, once processed, can be used to create and commercialize new products and services for society, thereby addressing its needs. Finally, as a practical contribution, the project’s findings seek to direct managers through the process of transforming and converting the conventional production process into a PDP based on intelligent goods [13–16].
14 Customer-Centric E-commerce Implementing Artificial Intelligence …
C.
145
Personalization
Personalization has been celebrated as the holy grail of marketing programmes, promising to derive new insights from billions of consumer experiences and demographic data. Hundreds of technology solutions and services have sprung up as a result of the personalization idea, which has been the subject of countless events, articles and academic studies. The author explores how chief data officers can understand the potential of their data [17–19, 20]. D.
Inventory Management
As a result, the focus of this paper is on MC inventory management in both forward and reverse logistics. The results of this study serve as a reference for operations managers looking to enhance inventory control in their MC operations. Future research opportunities in MC inventory management are explored, including supply chain alignment and risk management. Mass customization (MC), as an operations program to satisfy target consumers by offering personalized products or services, has attracted substantial attention from both the industry and academia. One of the most critical issues under this initiative is the efficient management of relevant inventories, such as work-in-process inventories, regular products and personalized items, which can eventually lead to a successful market for companies that have launched MC. One of the most important aspects of this programme is the efficient management of related inventories, such as work-inprocess inventories, standard goods, and personalized items, which can contribute to a competitive market for companies that have introduced MC [21].
3 AI in Web App Development AI can also alter Web development in years to come. Especially for small business, AI makes quick interaction possible with limited capabilities of a human. AI interacts with customers in order to provide relevant information, which helps in development of the user as well as organization. E-commerce prefers highly personalized UX Web apps, with smart and featured-packed. Just Imagine a project management with Agile approach is going to be automated, using AI Technique.You can realize how far the process automated will help you in cutting down your time and efforts to complete the design phase. Automation goes beyond the level you can imagine, like scheduling each day task of the developer in the project development. To better analyse the role of AI in Web apps or designs, you need to go through the impacts. Web development has reached the unprecedented rate of growth from past 10 years. Every business goes with its own Web applications. The desire of users for enhanced experiences along with customized data is accomplished. By the year 2025, the global artificial intelligence market is expected to be almost $60 billion; in the year 2016, it was $1.4 billion (Source: Accenture). Global GDP will grow by $15.7 trillion by the year 2030 thanks to artificial intelligence. (Source: MIT) [22].
146
A.
S. G. Thandekkattu and M. Kalaiarasi
Pong—AI technique in Web App Development
Here, playing against human is made as an entertainment, which makes you involve yourself as playing with strong opponent party. Not getting much in to actual set-up of the game sequence, let us only analyse what is the AI predictive analysis of Pong game for AI to win the game. Here, computer is the second player, which decides things by its own, covering all the possibilities of the game. AI Pong is programmed to control one of paddles in the game (Pong). The logic behind is to program AI in a simplest way using if statements. Then coming to the next step is randomness that will make a move back with predictability written as per the algorithm and this approach of randomness in the move of the player makes gaming fun. Let us suppose computer is the right-side player. It takes control of the right paddle and prevents the ball from moving away towards right, i.e. following the ball with its paddle [2]. The actual logic behind the scene is as follows: 1. If the right ball moves above the centre of paddle length; Then the paddle should move up 2. Else, if the right ball moves below the centre of the paddle, Then, move the paddle down.
Logic: Centre the paddle along with the ball position. B.
AI-Based Program Design for Web Apps
In Web application, it is very difficult to analyse customer choice and grab their attention. Each customer has its own constraint. Further, the marketing strategy also needs to be confirmed. In this paper, we study vision, customer choices, time management, shopping strategy and uncertainty handling as the most basic thing for modelling an AI. (1)
(2)
(3)
Vision: Vision can be enhanced for zoom-in command and colour along with design patterns, using built-in vision AI. A developer can analyse a vision and process the data related to the shopping interest, to market the customer’s similar and relevant products and to improve sales. Customer Choice Analysis: Shopping (or say using a Web app) can be made fun and excited by analysing the customer interest. Shopping does not get boredom, if he finds all relevant and similar products with varying ranges of prices, discounts and colour and design patterns. We can analyse the different filter and sort options used by customers and provide the particular items, so that they would buy them [23, 24]. Time Management: The user or Web app customers need to be urged to buy the items in the span of time to meet the sales target. The organization has plans to achieve certain level of sales in the time period to reach the hike in the
14 Customer-Centric E-commerce Implementing Artificial Intelligence …
(4)
(5)
(6)
C.
147
markets. The time management is usually done for sales, customer likes and dislikes and number of products most purchased [25–27]. Shopping Strategy: The main objective of any service-based Web app (let’s say for shopping, travelling, booking etc.), when the market competition increases, especially online shopping to attain highest sales rate, they need to consider all the factors that affect. The better strategy to attain is single sales, by each user. The count of sales and number of customers both matter in marketing strategy. If the agent has the highest shopping number that can be rewarded with more coupons to shop which ultimately increase the sales? But the most important strategy here to add is the referral shopping which increases the number of customers. More customers mean more reliable and global in market for any organization. The new strategy would be providing gift coupons for the referrals, which will help the referral customer to browse the Interactive Web App powered by AI, and feel the newer shopping experience in the web app. This will ultimately increase sales and grab a few more customers. Handling Uncertainty: Life is full of uncertainty, and as the saying goes, then the experiences also come up. Such scenarios can be fed for the process of decision-making. This analysis helps in handling the situation towards gain, concerning the failure of sales. Let us suppose that the customer logs in for the first time and shops for his/her sister, instead of himself/herself, and then the AI agent will store the vision, strategies of that particular shopping experience and start sending SMS, emails on that particular patterns, which will lose customer attention. Customer referrals being misused by fake Ids will not give the exact count of users. Customers must be interested in another way of shopping, instead of the one which AI agent follows and many more can arise [28–31]. This particular study on how to implement game AI model in Web apps will help us analyse in all the ways a customer would be affected with the AI agent. In order to handle the choices, an AI agent needs to confirm that the patterns drawn are for the customer or someone else (i.e they may be shopping for their husband, brother,etc). For this, they have to initially raise a question “Shopping for You or with options, “women, men, kids, and self, spouse, etc. Which can be either a dropdown or a text area to enter the relation/gender etc. Learning Experience: AI agent now starts giving more sales than real agents. The competition increases in such a scenario, thus replacing human workers and totally relying on machines. The feedback gathered on shopping experience will also be evaluated for both AI agent and real human agents. Therefore, it will prove the level of analysis a human and machine can really make. May be the latter case differs but always human intellectual can handle any situation, i.e. even the machine being trained [32–34]. Computer Vision Implementation
Vision implementation is done through MMS. It is a very good Python package which returns NumPy array for about 20 fps. OpenAI Gym, Adobe Sensei and Mint are helping in building the AI model. The process involves patterns, movement and process recognition in order to make decision further.
148
S. G. Thandekkattu and M. Kalaiarasi
Code def get_objects_in_masked_region(img, vertices, connectivity = 8):
''':return connected components with stats in masked region [0] retval number of total labels 0 is background [1] labels image [2] stats[0] leftmostx, [1] topmosty, [2] horizontal size, [3] vertical size, [4] area [3] centroids ''' mask = np.zeros_like(img) # fill the mask cv2.fillPoly(mask, [vertices], 255) # now only show the area that is the mask mask = cv2.bitwise_and(img, mask) conn = cv2.connectedComponentsWithStats(mask, connectivity, cv2.CV_16U) return conn
D.
Casting the Ray
def pong_ray(pong_pos, dir_vec, l_paddle, r_paddle, boundaries, steps = 250): future_pts_list = [] for i in range(steps): x_tmp = int(i * dir_vect[0] + pong_pos[0]) y_tmp = int(i * dir_vect[1] + pong_pos[1]) if y_tmp > boundaries[3]: #bottom y_end = int(2*boundaries[3] - y_tmp) x_end = x_tmp elif y_tmp < boundaries[2]: #top y_end = int(-1*y_tmp) x_end = x_tmp else:
14 Customer-Centric E-commerce Implementing Artificial Intelligence …
149
y_end = y_tmp ##stop where paddle can reach if x_tmp > r_paddle[0]: #right x_end = int(boundaries[1]) y_end = int(pong_pos[1] + ((boundaries[1] pong_pos[0])/dir_vec[0])*dir_vec[1]) elif x_tmp < boundaries[0]: #left x_end = int(boundaries[0]) y_end = int(pong_pos[1] + ((boundaries[0] - pong_pos[0]) / dir_vec[0]) * dir_vec[1]) else: x_end = x_tmp end_pos = (x_end, y_end) future_pts_list.append(end_pos) return future_pts_list
Paddling target Schematic for the calculation of the intercept position for paddle targeting.
4 Results and Web Apps Feedback The result and analysis are drawn from already working famous Web apps personal experience, while shopping and also feedbacks cross-checked for the particular scenario.
150
S. G. Thandekkattu and M. Kalaiarasi
Web app/model
Human agent
Customer
Bench mark
72,568
93,665
AI agent 7,53,110
1st WebApp
3,49,140
3,95,250
1,98,220
2nd WebApp
5,11,260
6,02,240
2,36,810
3rd WebApp
5,13,730
3,80,410
2,53,460
5 Conclusion AI in E-commerce has a major impact on increasing personalized experiences for customers across devices, and artificial intelligence is revolutionizing conversations and content. Many E-commerce businesses are already using forms of AI to better understand their customers, generate new leads and provide an enhanced customer experience. The Internet has opened the door for revolutionizing various sectors. Ecommerce sectors have unlocked new opportunities and scope for retailers. Retailers also have never seen such a growth in their sales. Artificial intelligence is taking E-commerce to the next level. In this study, we propose AI agent model for a competitive platform which includes vision, predictive analysis, strategy and uncertainty handling. The research is carried on strategic decision-making, time management, customer choice analysis, shopping analysis, handling uncertainty and learning experience. The design is particularly for Pong implementation of code and concept of neural networks unlike reinforcements used as AI model. In future, we can take this research towards virtual e-stores, which give the experience same as offline stores. This concept should be enhanced and will help customers to be more familiar with product colour, pattern and design in a more detailed manner. The AI-enabled solutions help companies gain deeper insights into the customer experience and as a result make more informed decisions on how to move forward. Artificial intelligence would have a huge influence on how E-commerce firms acquire and retain customers. The E-commerce AI revolution would produce a lot of new data science, machine learning and engineering employment. AI-based E-commerce would also result in the creation of IT jobs to build and manage the systems and software that will be used to run the AI algorithms.
References 1. A. Radziszewska, P. Czstochowska, Assessment of customer’s satisfaction in e-commerce services 15(58), 383–393 (2013) 2. A. Canossa, J.B. Badler, M.S. El-Nasr, S. Tignor, R.C. Colvin, In your face (t) impact of personality and context on gameplay behaviour. FDG (2015)
14 Customer-Centric E-commerce Implementing Artificial Intelligence …
151
3. K. Dhou, C. Cruzen, An innovative chain coding technique for compression based on the concept of biological reproduction: an agent-based modeling approach. IEEE Internet Things J. 1–9 (2019) 4. Microsoft, State of Global Customer Service Report (2018) 5. Conduent, The State of Consumer Experience Communication Edition 2018 (2018) 6. E. Kursunluoglu, Customer service effects on customer satisfaction and customer loyalty: a field research in shopping centers in Izmir City—Turkey. Int. J. Bus. Soc. Sci. 2(17), 52–59 (2014) 7. G.K. Amoako, improving customer service in the banking industry-case of Ghana improving customer service in the banking industry-case of Ghana commercial bank (GCB)—Ghana. Int. Bus. Res. 5(4), 134–148 (2012). https://doi.org/10.5539/ibr.v5n4p134 8. K.M. Malik, A. Javed, H. Malik, A. Irtaza, A light-weight replay detection framework for voice controlled IoT devices. IEEE J. Sel. Topics Signal Process. J. 14 (2020) 9. A. Weese, D. Peiffer, Customer service: then and now. in SIGUCCS ’13 Proceedings of the 41st annual ACM SIGUCCS conference on User services (2013), pp. 35–38. https://doi.org/ 10.1145/2504776.2504804 10. S. Fallah, Customer Service. Logistics Operations and Management. Elsevier Inc. https://doi. org/10.1016/B978-0-12-385202-1.00011-6 (2011) 11. F. Demirci, A. Kara, Supermarket self-checkout service quality, customer satisfaction, and loyalty: empirical evidence from an emerging market. J. Retail. Consum. Serv. 21(2), 118–129 (2014). https://doi.org/10.1016/j.jretconser.2013.07.002 12. L. Cui, S. Huang, F. Wei, C. Tan, C. Duan, M. Zhou, Super-agent: a customer service chatbot for E-commerce websites. in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics-System Demonstrations (2017), pp. 97–102. https://doi.org/10. 18653/v1/P17-4017 13. C.G. Sá Cavalcante, D.C. Fettermann, Recommendations for product development of intelligent products. IEEE Lat. Am. Trans. J. 17 (2019) 14. J. Lester, K. Branting, B. Mott, Conversational agents. Pract. Handb. Internet Comput. (2004). https://doi.org/10.1201/9780203507223.ch10 15. M. Steup, Customer care trends 2019: convenience is key (2019). Retrieved from https://www. messengerpeople.com/customer-caretrends-2019/ 16. V. Chattaraman, W.-S. Kwon, J.E. Gilbert, S.I. Shim, Virtual agents in e-commerce: representational characteristics for seniors. J. Res. Interact. Mark. 5(4), 276–297 (2015) 17. S. Earley, The problem of personalization: AI-driven analytics at scale. IEEE Trans. 19 (2017) 18. T. Verhagen, J. van Nes, F. Feldberg, W. van Dolen, Virtual customer service agents: using social presence and personalization to shape online service encounters. J. Comput.-Mediat. Commun. 19, 529–545 (2014). https://doi.org/10.1111/jcc4.12066 19. D.H. Kang, D. Leem, J. Choi, J. Kim, J. Park, A study on the personalization characteristics affecting user’s intention to use mobile learning. Adv. Sci. Technol. Lett. 103, 180–185 (2015). https://doi.org/10.14257/astl.2015.103.39 20. H. Fan, M.S. Poole, What is personalization? Perspectives on the design and implementation of personalization in information systems. J. Organ. Comput. Electron. Commer. 16(3 & 4), 179–202 (2006). https://doi.org/10.1207/s15327744joce1603 21. S. Guo, T.-M. Choi, B. Shen, S. Jung, Inventory management in mass customization operations: a review. IEEE Trans. Eng. Manage. 66 (2019) 22. T.M. Nisar, G. Prabhakar, What factors determine e-satisfaction and consumer spending in e-commerce retailing? J. Retail. Consum. Serv. 39(May), 135–144 (2017). https://doi.org/10. 1016/j.jretconser.2017.07.010 23. ServiceNow, & Devoteam. The AI Revolution: Creating a New Customer Service Paradigm (2018) 24. R. Khan, A. Das, Build Better Chatbots—A Complete Guide to Getting Started with Chatbots (Apress, Berkeley, CA) https://doi.org/10.1007/978-1-4842-3111-1_1 25. M. Huang, R.T. Rust, Artificial Intelligence in service. J. Serv. Res. 21(2), 155–172 (2018). https://doi.org/10.1177/1094670517752459
152
S. G. Thandekkattu and M. Kalaiarasi
26. D. Gursoy, O.H. Chi, L. Lu, R. Nunkoo, Consumers acceptance of artificially intelligent (AI) device use in service delivery. Int. J. Inf. Manage. 49, 157–169 (2019) 27. Johnston, R. (1995). The determinants of service quality satisfiers and dis-satisfiers. International Journal of Service Industry Management, 6(5), 53–71. https://doi.org/10.1108/095642 39510101536 28. BrandGarage, & Linc, How AI technology will transform customer engagement (2018) 29. F. Li, M. Qiu, H. Chen, X. Wang, X. Gao, J. Huang, W. Chu, AliMe assist: an intelligent assistant for creating an innovative E-commerce experience, in CIKM’17: Proceedings of the 2017 ACM Conference on Information and Knowledge Management (2017), pp. 2–5. https:// doi.org/10.1145/3132847.3133169 30. A. Parasuraman, V.A. Zeithaml, L.L. Berry, A conceptual model of service quality and its implication for future research (SERVQUAL). J. Mark. 49, 41–50 (1985). https://doi.org/10. 2307/1251430 31. J.M. Getty, R.L. Getty, Lodging quality index (LQI): assessing customers’ perceptions of quality delivery. Int. J. Contemp. Hosp. Manag. 15(2), 94–104 (2003). https://doi.org/10.1108/ 09596110310462940 32. A. Khalid, O.O.K. Lee, M. Choi, J. Ahn, The effects of customer satisfaction with E-commerce system. J. Theor. Appl. Inf. Technol. 96(2), 481–491 (2018) 33. A. Parasuraman, V.A. Zeithaml, L.L. Berry, SERVQUAL: a multiple—item scale for measuring consumer perceptions of service quality. J. Retail. 64(1), 12–40 (1988) 34. A. Parasuraman, V.A. Zeithaml, A. Malhotra, E-S-Qual: a multiple-item scale for assessing electronic service quality. J. Serv. Res. 7(3), 213–233 (2005). https://doi.org/10.1177/109467 0504271156 35. F.-J. Cossío-Silva, M.-Á. Revilla-Camacho, M. Vega-Vázquez, B. Palacios-Florencio, Value co-creation and customer loyalty. J. Bus. Res. 69(5), 1621–1625 (2016). https://doi.org/10. 1016/j.jbusres.2015.10.028 36. M. Peng, Y. Qin, C. Tang, X. Deng, An E-commerce customer service robot based on intention recognition model. J. Electron. Commer. Organ. 14(1), 34–44 (2016). https://doi.org/10.4018/ JECO.2016010104
Chapter 15
Tomato Plant Disease Classification Using Deep Learning Architectures: A Review U. Shruthi , V. Nagaveni , C. S. Arvind , and G. L. Sunil
1 Introduction Farmers are facing problems in the cultivation of tomatoes. Tomato crops have been suffering from various disorders such as nutrition deficiency, lack of soil test facility, seeding and planting, climate factors, and insufficient and untimeliness of rain. Due to these reasons, pathogens in the tomato crop have led to a massive loss in production/yield. Due to the labor-driven approach, early identification and mitigation have led to financial losses; hence, there is a need for automatic plant diseases identification system. This system uses leaf, stem, and fruit image data that has an input that can be acquired by mobile cameras or digital cameras. Artificial intelligence is essential when devices can classify tomato diseases that typically involve human intellect. Deep learning is a subgroup of machine learning, where multi-layer neural network procedures have been used to learn from a large set of image data. Figure 1a shows the number of research papers published on tomato plant disease detection year-wise from 2015 onwards. This indicates that the number of publications is increased on tomato plant disease detection. Figure 1b shows several papers published on tomato plant diseases using different technologies and year-wise. This indicates deep learning technology made tremendous changes in tomato disease classifications. U. Shruthi (B) Presidency University Bangalore, Bengaluru, India V. Nagaveni Acharya Institute of Technology, Bengaluru, India C. S. Arvind Singapore Bioimaging Consortium A*Star, Biopolis Way, Singapore G. L. Sunil Sai Vidya Institute of Technology, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_15
153
154
U. Shruthi et al.
Fig. 1 Year-wise tomato disease detection, a number of papers published, b number of publications based on technology
2 Types of Tomato Diseases Numerous pathogens can cause the tomato plant’s diseases by fungal, bacterial, viral organisms, nutritional disorders, insects affected, etc. The major diseases affected by the tomato plant are given in Table 1, and the relevant images of bacterial, viral, and fungal diseases are shown in Fig. 2 (i), (ii), and (iii), respectively, where the infection of these pathogens depends on location temperature, humidity, moisture, soil, etc. The plant diseases can be identified by physical observations done by experts. Disease detection at an early stage can treat the plant appropriately. But recognition of diseases in the plant, not an easy task, requires an accurate description of the symptoms by Table 1 Tomato plant diseases Figure 2 i
ii
Pathogen
Disease
Identified part
Bacteria
Bacterial spot
Leaf and fruit
b
Bacterial canker
Leaf, stem, and fruit
c
Bacterial speck
Leaf and fruit
d
Bacterial wilt
Leaf and stem
a
a
Tomato mosaic
Leaf
b
Virus
Yellow leaf curl
Leaf
c
Spider mite
Leaf
d iii
Cucumber mosaic
Leaf
Early blight
Leaf and fruit
b
Late blight
Leaf, fruit, and stem
c
Septoria leaf spot
Leaf
d
Powdery mildew
Leaf and stem
e
Leaf mold
Leaf
f
Fusarium wilt
Leaf
g
Target spot
Leaf
a
Fungi
15 Tomato Plant Disease Classification Using Deep Learning …
155
Fig. 2 i Bacterial diseases: (a) bacterial spot (b) bacterial canker (c) bacterial speck (d) bacterial wilt, ii Viral diseases: (a) tomato mosaic (b) yellow leaf curl (c) spider mite (d) cucumber mosaic, and iii Fungal diseases: (a) early blight (b) late blight (c) septoria leaf spot (d) powdery mildew (e) leaf mold (f) fusarium wilt (g) target spot
experts with their experience and knowledge on treatment and protection of the plant. Continuous monitoring of plant diseases in large fields requires intense labor-driven efforts, which might not be efficient due to human errors. To overcome this drawback, machine vision methods for the recognition and classification of diseases in the plant have been employed. The initial step of plant disease detection in machine vision is the collection of leaf images captured using a digital camera. The second step is to extract the features like color and texture of leaf images using feature extraction algorithms and finally classify the type of diseases using machine learning or deep learning classification methods. A multi-class classification method can be applied to classify the variety of diseases in the plant [1]. Many classifiers have been developed to identify plant diseases, and they are K-nearest neighbors (KNN), support vector machine (SVM), decision tree, fuzzy C-means, artificial neural network (ANN), and convolutional neural network.
156
U. Shruthi et al.
3 Mechanism of Deep Neural Network A deep neural network is a repetitive process of passing the input image from one layer neuron to the next layer neurons. Every step information is brought out and passed to the next layer. Each layer inputs are summarized and pass to the activation function, which resides internally, and performance of a model can be optimized by choosing hyperparameters. Once the information has navigated all the layers, the output produced by a model compares with the target class. An error is computed, and update the weights based on the error value; this process is known as backpropagation. Loss function needs to be specified here, and it is an essential tool for calculating the efficiency of a model and learning to be adopted by doing multiple epochs to minimize the error rate. The tomato plant diseases can be classified using a deep neural network model, and its flow diagram is shown in Fig. 3.
3.1 Convolutional Neural Network CNN processes the multidimensional data using deep neural networks. CNN has different layers for extraction of features and classification of diseases. Feature extraction includes many convolution layers, nonlinear pooling layers, and activation functions. Fully connected layers are used as a classifier, and this reduces the complexity of processing the images without compromising the features that are essential for classification. The process of the CNN model is shown in Fig. 4. CNN takes input plant leaf images and learns spatial features over a stack of convolution
Fig. 3 Flow diagram for tomato plant disease detection using deep learning
15 Tomato Plant Disease Classification Using Deep Learning …
157
Fig. 4 General architecture of CNN
and pooling process. These features are reduced into a vector and passed to fully connected layers; vectors can be probabilities for a set of classes.
4 Related Work An extensive literature survey has been conducted on major types of CNN models proposed to identify and classify tomato plant diseases. A summarized review carried out on the collected publications is given in Table 2 based on our knowledge. The following subsections discussed different CNN models and reviewed proposed tomato disease detection methods using respected models.
4.1 LeNet LeNet was developed for handwritten recognition [2], it has seven layers of which, three are convolution layers, two are subsampling layers, and the last two are fully connected layers and has 60 K parameters. A computational model using the CNN network was developed [3]. Variations in LeNet architecture developed with the first three blocks each are made up of a convolutional layer, activation function, and max-pooling layers, followed by fully connected layers and softmax activation. The dataset consists of 18,160 images, out of that, 13,360 are training images, and 4800 are testing images. The computation has done with different epochs in that 94.8% was the maximum validation accuracy over 30 epochs.
158
U. Shruthi et al.
Table 2 Summary of selected publication References Technology Publication No. type
No. of Algorithm used diseases
Result obtained
[30]
Image processing
Conference 1
Color structure descriptor (CSD), scalable color descriptor (SCD), and color layout descriptor (CLD)
CSD 64 bin results with best descriptor model
[31]
Machine learning
Conference 2
SVM
99.5% accuracy
[32]
Machine learning
Conference 2
SVM
99.83% accuracy
[33]
Machine learning
Conference 5
Decision tree
97.3% accuracy
[14]
Deep learning
Scopus
9
GoogleNet model
99.35% accuracy
[34]
Machine learning
Scopus
2
Principal component analysis (PCA) and coefficient of variation
PCA method achieved accuracy of 90% for powdery mildew and 95.2% for two-spotted wilt virus diseases, respectively
[5]
Deep learning
Conference 9
AlexNet and SqueezNet
95.65% by AlexNet
[35]
Image processing
Conference 4
Fast library for approximate nearest neighbor
Two bacterial diseases classified accuracy is 91.1% and two fungus diseases classified accuracy is 94%
[28]
Deep learning
Google scholar
9
R-FCN with ResNet50 feature extractor
Average precision 85.98%
[6]
Deep learning
Scopus
6
AlexNet and VGGNet
AlexNet achieves 97.49% accuracy
[3]
Deep learning
Conference 9
LeNet
94.85% accuracy in 30 epochs
[21]
Deep learning
Conference 5
SqueezeNet
86.92% accuracy
[7]
Deep learning
Conference 3
F-RCNN
91.6% accuracy (continued)
15 Tomato Plant Disease Classification Using Deep Learning …
159
Table 2 (continued) References Technology Publication No. type
No. of Algorithm used diseases
Result obtained
[10]
Deep learning
Conference 9
VGGNet, AlexNet, GoogleNet, and Baseline network
VGGNet outperforms the result by achieving 95.24% accuracy
[36]
Machine learning
Google scholar
KNN classification
Accuracy calculated on different stages, secured 100% for healthy stage
[37]
Deep learning
Conference 7
CNN
97.05% accuracy
[38]
Deep learning
Google scholar
5
CNN
99.84% training accuracy
[39]
Deep learning
Conference 9
CNN
98.29% of training and 98.029% of testing accuracy
[40]
Deep learning
Conference 9
CNN
96.50% accuracy
[26]
Deep learning
Conference 9
CNN
87% accuracy
[41]
Deep learning
Scopus
1
Mask R-CNN with ResNet101
99.64% mean average precision
[42]
Deep learning
Google scholar
3
Inception-ResNet v2
87.27% accuracy
[43]
Deep learning
Scopus
9
CNN
93.67% accuracy
[44]
Deep learning
Scopus
2
VGG19
98.60% accuracy
[45]
Deep learning
Conference 9
DenseNet_Xception, DenseNet_Xception ResNet 50, is 97.10% accuracy MobileNet, and Shuffle Net
[46]
Deep learning
Conference 3
CNN
98.12% accuracy
[47]
Machine learning
Conference 9
Linear regression analysis, KNN, decision tree, naive Bayes, and SVM
SVM performs better
[48]
Machine learning
Scopus
1
Random forest
93.12% accuracy
[49]
Deep learning
Scopus
9
CNN
91.20% accuracy
3
(continued)
160
U. Shruthi et al.
Table 2 (continued) References Technology Publication No. type
No. of Algorithm used diseases
Result obtained
[50]
Deep learning
Scopus
9
CNN
98.6% training and 82% testing
[11]
Deep learning
Google scholar
1
VGG16, VGG19, and ResNet50
VGG16 achieves 91.9% accuracy
[27]
Deep learning
Google scholar
5
CNN
96.77% accuracy
[17]
Deep learning
Conference 3
Resnet50
98.3% training and 98% testing
[51]
Deep learning
Google scholar
9
CNN, MobileNet, MobileNet achieves Vgg16, and inception 91.12 training and 90.78% testing accuracy
[23]
Deep learning
Scopus
3
MobileNet V2
90% accuracy
[52]
Deep learning
Conference 9
VGG16, Inceptionv3, and MobileNet
MobileNet achieves 97% accuracy
[53]
Deep learning
Scopus
12
YoloV3
92.39% accuracy
[54]
Machine learning
Scopus
1
SVM
97.80% accuracy
[55]
Machine learning
Scopus
3
Back propagation neural network
91% accuracy
[56]
Deep learning
Scopus
9
MobileNet, Xception, and NasNetMobile
Xception model performs best
[57]
Deep learning
Scopus
23
ShallowNet and dense net
Dense Net achieves 95.31% accuracy on the test set
4.2 AlexNet AlexNet [4] was developed on GPU for ImageNet classification, and it has a total of eight layers of which convolutional layers are 5 and FC layers are 3, which consists of 650,000 neurons. AlexNet was developed by initiating the ReLU activation function and use of dropout instead of regularization method and the network has 60 Million parameters. AlexNet model was developed [5] to detect nine diseases of tomato plants, tested, and validated on NVIDIA Jetson TXI. AlexNet network is trained on multiple GPUs, and it consists of different layers like convolution layers, ReLU activation function and normalization, pooling, and full connection. Features extracted at the final convolutional layer are passed to a fully connected layer but after Pooling (downsampling) and dropout layer used to reduce the overfitting. In the last, fully
15 Tomato Plant Disease Classification Using Deep Learning …
161
connected layer likelihoods are classified by a softmax activation function. Accuracy of the test set is 95.65%, model size is 227.6Mbyte, and inference time is 150 ms. Another AlexNet model was developed [6] to classify six diseases of tomato plant. Dataset used consists of 13,262 segmented images. The AlexNet model is proposed here as five convolutional layers and three fully connected layers. The number of activation maps produced from the ReLU activation function for each convolution layer was same. The last fully connected layer has improved with results in seven classes. The precision of the classification achieved is 97.49%. AlexNet CNN architecture [7] was developed for large-scale visual recognition challenges. This model is trained using a region-based convolutional neural network (R-CNN) by improving the fully connected layer of the pre-trained network. Automatically 36 images are captured using an electronic linear and pivotal motion camera, and an arm enables the system for movement of the camera and observes the surface of the plant. Out of 36 images, 33 were detected successfully, and obtained accuracy is 91.6%.
4.3 Vgg 16 VGG 16 [8] developed with 16 layers, where convolution layers are 13, and fully connected layers are 3 with ReLU activation function, and it has 138 Million parameters. VGG network [9] was developed and trained to identify the healthy and infected plants of dataset 87,848 leaves using the parameters, namely batches/epoch, momentum, weight decay, batch size, and learning rate. These CNN models are trained and tested with 58 classes including tomato diseases. This model has achieved better performance when consider with 20% of the testing data set (and 80% training dataset) and 30% testing dataset (and 70% training dataset) and the same model is tested further in different epochs and considering only 12 classes instead of 58 classes of image dataset which collected in the test center and field situation images. The success rate of field condition is lower than test center condition and achieved 99.53% of success rate in the classification of 17,548 previously unseen data. Evaluation [10] of four different CNN architectures done is VGGNet, GooleNet, AlexNet, and a simple two-layer Baseline network using the dataset of 18160 tomato leaf images. Dataset has nine tomato plant diseases, all the images are resized into 64 × 64 pixels, and the parameters used for all the models are batch size is 10 and 40 epochs. In the comparative results, VGGNet achieves the highest accuracy 95.24% among all the four CNN models. VGG16, VGG19, and ResNet50 models [11] developed to detect the leaf miner pest in tomato plants. These models were trained using 2145 images of two classes of healthy and infected leaves and tested with 66 unseen images. Among these three models, VGG16 achieved high accuracy of 91.9%.
162
U. Shruthi et al.
4.4 Inception/GoogLeNet InceptionNet [12] has developed in different versions with improvements, and version 4 is also named as GoogLeNet. This model has 22 layers, and a network is built using modules/blocks. Inception modules consist of the concatenation of parallel towers of convolutions with different filters, 1 × 1 convolutions are used to remove the computations bottleneck, and it has 4 million parameters. GoogLeNet model [13] was developed for plant disease classification, in which 22,930 images of ten classes of tomato diseases including healthy leaves are classified and achieved the accuracy of 81.84% in 81 min 55 s. Another model developed [14] with nine inception modules for 55,306 images consists of 26 diseases of different crops and compared the results with three types of image data (colored, grayscale, and segmented) with two types of learning (transfer learning and training from scratch). The accuracy achieved from transferred learning is 99.34%, and training from scratch is 98.36% for color image data.
4.5 ResNet/ResNeXt ResNet50 [15] model able to train with 152 layers, with two blocks, namely convolution block and identity block, adopted batch normalization and skip connection (gated recurrent units), and this model has 26 million parameters. ResNeXt [16] is an updated version of ResNet50 added parallel towers within each module total of 32 towers, and this model has 25 million parameters. ResNet50 deep residual network model [17] has been developed to recognize three kinds of tomato diseases. Total 6794 images are considered for this experiment, and the parameters used are Leaky ReLU activation function and batch gradient descent with about 24 epochs and achieved the training accuracy up to 98.3% and testing accuracy 98.0%.
4.6 YOLO Model You Only Look Once (YOLO) [18] model developed for object detection, it consists of 24 convolutional layers followed by four max-pooling layers, two FC layers, and Leaky ReLU, and this model has 65 million parameters. YOLO model-based network [19] was designed with 24 convolution layers followed by two fully connected layers. This model has trained with the four class classification dataset in 2000 iteration using stochastic gradient descent. Tested on a subset of a dataset consisting of three leaf diseases of the tomato plant including late blight of 121 samples, bacterial canker of 111 samples, and gray spot of 113 samples and healthy leaves, totally 520 images are captured in a different time and orientation. The overall accuracy of this system achieved is 89%.
15 Tomato Plant Disease Classification Using Deep Learning …
163
4.7 SqueezeNet SqueezeNet [20] model is smaller CNN architecture with few parameters. It is introduced with fire module to reduce the model size. This architecture begins with a convolution layer followed by eight fire modules and ended by a convolutional layer, and pooling layer is included after the first convolution layer, the fourth and eight fire module, and the final convolutional layer. SqueezeNet architecture developed [21] to identify the tomato plant diseases implemented on a smartphone device, server computer, and microcontroller. The dataset considered for this model consists of a total of 1400 images which has six classes of diseases and one class of healthy leaves. The diseases identified are early blight, phosphorus deficiency, late blight, calcium deficiency, magnesium deficiency, and tomato leaf miner diseases. The parameters used in the training process are several epochs, optimizer, and loss function. The 86.92% of average accuracy is obtained using the K-fold cross-validation method.
4.8 Mobile Net MobileNet [22] is a lightweight architecture using depth-wise separable convolutions. It consists of 28 layers of point-wise and depth-wise convolution layers, each followed by batch normalization and ReLU activation function. Strides are used to handle downsampling in convolution layers. The last three layers are average pooling, fully connected layer, and softmax classifier. MobileNet V2 [23] was developed to classify three diseases, namely late blight, mosaic virus, leaf mold, and healthy leaves of tomato consists of 4671 image data. The accuracy achieved is 95.94% for batch size 16. A mobile application [24] has been developed to recognize tomato diseases using MobileNet. This model has trained with 7176 image data. 90.3% accuracy achieved when there is a 0.001 learning rate.
4.9 EfficientNet EfficientNet [25] model developed for compound scaling method considering all the dimensions of network width, depth, and resolution of ImageNet, and it has 66 million parameters at B7 baseline network. Baseline network B0 model consists of nine stages, starts with convolution layer followed by seven stages of mobile inverted bottleneck blocks with different kernels, and final stage consists of convolution, pooling, and fully connected layer with sigmoid-weighted linear unit (SiLU) activation function. B1 to B7 network scaled up by using the different compound coefficient values. A plant leaf disease classification model using EfficientNet developed [26] for dataset consists of 55,448 plant leaf images of 38 classes including tomato leaves. Here, B5 model achieved the best average accuracy of 99.91%, time
164
U. Shruthi et al.
taken for training is 643.3 min, and an accuracy of 99.97% was achieved for 61,486 augmented images.
4.9.1
Transfer Learning
A transfer learning-based binary class classification CNN model [27] is developed to detect and classify plant diseases, which consist of four convolution layers with 32 filters, and a pooling layer follows a every convolution layer, followed by two fully connected dense layers and sigmoid function. Low-level features are produced in the first convolution layer, and the next three convolution layers are produced with high-level features. The classes are predicted using two fully connected layers and are developed with 128 nodes in the hidden layer and softmax function. The accuracy obtained by this model was 88.7% using a stochastic gradient descent algorithm for the optimizer parameter. A CNN model [28] developed made up of three blocks of convolution layers, batch normalization function, and max-pooling layer afterward fully connected layer and a softmax layer. The convolution layer uses 8, 16, and 32 filters in the first and second, and the third block, respectively. Using this model, six classes of tomato plant leaf diseases and healthy leaves are classified with 6202 images divided into 4342 training set and 1860 testing set with ten epochs and 67 iterations. The accuracy obtained is 96.43% to classify five diseases of tomato plant. A deep-meta architecture [29] is developed for nine classes of tomato diseases and pest detection including complex inter-class and intra-class variations. Faster R-CNN, single shot detector (SSD), and region-based fully convolution networks (R-FCN) are combined with deep feature extractors such as VGGNet and ResNet. The developed model resulted in 0.8306 of total mean average precision when the image dataset was considered with augmentation and 0.5564 of total mean average precision when the image dataset was considered without augmentation. The above-reviewed papers summarize tomato disease detection and classification methods using deep neural network architecture. The reviewed literature states that most of the proposed methodology uses a transfer learning approach with ImageNet weights as the backbone. Most of the literature has used the plant village image dataset. At present, EfficientNet B4 and B5 has achieved a high degree of accuracy. Most of the architecture concentrated on accuracy but not in real time. Figure 5a shows the distribution of the type of publications chosen in this review. This shows publications found more in conference proceedings than Scopus journals and journals in Google Scholar. Figure 5b shows a maximum number of diseases detected and classified using image processing (IP), machine learning (ML), and deep learning (DL) methods. This graph shows deep learning technology classifies more number of disease classes. Selected publications are summarized in Table 2. The accuracy obtained by image processing and machine learning techniques shows more than 90% using fewer images in the dataset. The accuracy may be reduced when considering the larger number of images in the dataset.
15 Tomato Plant Disease Classification Using Deep Learning …
165
Fig. 5 a Distribution of the type of publications, b maximum number of diseases detected based on technology
5 Future Trends In real time, plant diseases can be detected by a farmer by taking the help of artificial intelligence-based digital tools like tab, mobile phone, or unmanned aerial vehicle (drone). Smart farming methods helps the farmer in farming using the Internet of Things technology using various sensors like humidity sensors, moisture sensors, and temperature sensors, etc. These sensors can be controlled by using Raspberry Pi, Arduino UNO, or microcontrollers. Deep learning edge devices may be used in real-time applications to improve the latency by speeding up inference. Nowadays, inference workload is moved to the edge devices like smartphones and Raspberry Pi, and researchers can utilize NVIDIA and snapdragon’s edge platform. In machine learning, federated learning is used to train an algorithm across multiple distributed edge devices. In which, training data distributed on edge devices and learns a model by combining locally computed updates. This helps to increase parallelism and computation to improve speed.
6 Conclusion Numerous research works have been conducted to improve automatic tomato plant disease classification techniques using different types of CNN architecture like AlexNet, SqueezNet, VGG16, GoogleNet, R-CNN, etc. This paper reviews the different improved versions of CNN architecture applied to classify tomato diseases. Review finds that CNN methodology has improved the classification accuracy compared to traditional machine learning. Researchers are developing new methods by considering activation function, optimization, and other hyperparameters. New architectures are being developed for multi-class automatic tomato plant disease detection, which can be deployed for real-time application in agriculture technology.
166
U. Shruthi et al.
Acknowledgements This review process was fully conducted at Acharya Institute of Technology, Bengaluru. Author 1 is currently working in Presidency University, Author 3 is working in SBIC, A*Star, and Author 4 is working in Sai Vidya Institute of Technology, and during the submission of this menu script, no research activity was conducted at Presidency University, SBIC, A*Star, and Sai Vidya Institute of Technology.
References 1. U. Shruthi, V. Nagaveni, B.K. Raghavendra, A review on machine learning classification techniques for plant disease detection, in International Conference on Advanced Computing and Communication Systems, Coumbatore, India (2019) 2. Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 3. P. Tm, A. Pranathi, K. SaiAshritha, N.B. Chittaragi, S.G. Koolagudi, Tomato leaf disease detection using convolutional neural networks, in Eleventh International Conference on Contemporary Computing (IC3), Noida (2018) 4. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in NIPS’12: Proceedings of the 25th International Conference on Neural Information Processing Systems (2012) 5. H. Durmus, E.O. Gunes, M. Kiric, Disease detection on the leaves of the tomato plants by using deep learning, in 6th International Conference on Agro-Geoinformatics, Fairfax, VA, USA (2017) 6. A.K. Rangarajan, R. Purushothaman, A. Ramesh, Tomato crop disease classification using pre-trained deep learning algorithm. Procedia Comput. Sci. 133, 1040–1047 (2018) 7. R.G.D. Luna, E.P. Dadios, A.A. Bandala, Automated image capturing system for deep learningbased tomato plant leaf disease detection and recognition, in 2018 IEEE Region 10 Conference, Jeju, Korea (South) (2018) 8. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in International Conference on Learning Representations (2015) 9. K.P. Ferentinos, Deep learning models for plant disease detection and diagnosis. Comput. Electron. Agric. 145, 311–318 (2018) 10. E. Suryawati, R. Sustika, R.S. Yuwana, A. Subekti, H.F. Pardede, Deep structured convolutional neural network for tomato diseases detection, in International Conference on Advanced Computer Science and Information Systems (ICACSIS), Yogyakarta, Indonesia (2018) 11. L. Mkonyi, D. Rubang, M. Richard, N. Zekeya, S. Sawahiko, B. Maiseli, D. Machuve, Early identification of Tuta absoluta in tomato plants using deep learning, Sci. African 10, e00590 (2020) 12. C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, Inception-v4, inception-ResNet and the impact of residual connections on learning, AAAI Conference on Artificial Intelligence (2016) pp. 4278– 4284 13. S. Kaur, G. Joshi, R. Vig, Plant disease classification using deep learning google net model. Int. J. Innovative Technol. Exploring Eng. 8(9), 319–322 (2019) 14. S.P. Mohanty, D.P. Hughes, A.M. Salathe, Using deep learning for image-based plant disease detection, Frontiers in plant science (2016), p. 1419 15. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA (2016) 16. S. Xie, R. Girshick, P. Dollár, Z. Tu, K. He, Aggregated residual transformations for deep neural networks, in Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA (2017)
15 Tomato Plant Disease Classification Using Deep Learning …
167
17. D. Jiang, F. Li, Y. Yang, S. Yu, A tomato leaf diseases classification method based on deep learning, in Chinese Control And Decision Conference (CCDC), Hefei, China (2020) 18. J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA (2016) 19. S. Adhikari, B. Shrestha, B. Baiju, S. Kumar, Tomato plant disease detection system using image processing, in 1st KEC Conference on Engineering and Technology, Dhapakhel Lalitpur (2018) 20. F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, K. Keutzer, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and +b} = 0}
(3)
where w represents normal vector, and b represents the offset vector. Defect detection is properly done by feeding the feature vector into the SVM, and in the classification stage, SVM will identify the defective fabric sample and normal fabric samples by comparing the test features with the trained features.
3.6.2
Neural network (NN)
Classifier is one where the output of the processed image is used as an input to calculate the weighted factors and generates the desired classification of defects as
178
S. Soma and H. Pooja
Fig. 3 Neural network classifier
an output. Figure 3 shows the working of the neural network classifier, 13 features are extracted for the fabric image which are passed as input to the hidden layer along with weights (W) and bias (B) where the hidden layer consists of about 20 layers, and then, it is passed to the output layer along with weights and bias which generates a single output that is normal or defective fabric image. In literature, different image processing techniques have been used in order to identify the weave patterns for estimation of structural characteristics of the fabrics to determine their quality and also to detect the defects in the fabric easily. In this proposed work, results of two classifiers, namely SVM and NN, are compared, and NN classifier gives better results compared to SVM.
4 Experimental Results and Discussion In this proposed work, textile fabric images are collected from standard TILDA dataset, and about 1000 fabric image samples are considered for carrying out the experimentation of the work. Modules of the proposed work are implemented using MATLAB 2019 (b). Figure 4 shows the experimental result obtained for identifying defective and normal fabric images.
Fig. 4 Sample of result obtained, a defective fabric image, b normal fabric image
16 Machine Learning System for Textile Fabric Defect Detection …
179
Table 1 shows feature values of samples of textile images, and the 13 features, namely contrast, correlation, cluster shade, energy, dissimilarity, contrast inverse difference, correlation, autocorrelation, entropy, maximum probability, homogeneity, sum of square, standard division DCT, are considered for the study. The experiment is conducted using all 1000 fabric image samples obtained from the TILDA dataset. The overall accuracy obtained by SVM classifier is 85% and by NN classifier is 95% which is shown graphically in Fig. 5. Table 2 shows experiment conducted to determine overall accuracy of the classifiers SVM and NN using 1000 fabric images which are trained by using 180 fabric images and tested by using 820 fabric images to predict correctly identified and incorrectly identified fabric images. Table 3 shows the overall performance comparison of two classifiers that are neural network (NN) and support vector machine (SVM). Table 4 shows the performance comparison of the F-measure and area under the curve (AUC) for the models NN and SVM. Figure 6 displays the performance measure like accuracy, sensitivity, specificity, precision, recall, and performance comparison in terms of F-measure and AUC for the two models NN and SVM classifier to obtain the correct predictions.
5 Conclusion This paper proposes a novel fabric defect detection method based on gray-level cooccurrence matrix (GLCM) for feature extraction, SVM and Neural Network is used for classification of different fabric samples. The proposed method is simple and easy to understand and also to implement. It also aims to provide better performance results as 95% accuracy by using NN classifier and 85% by using SVM classifier. Hence, neural network classifier is considered to be better than the SVM classifier. Extension of this work is to compare different learning algorithms as CNN and other similar approach and determine type of defect with the percentage of the defect present in the fabric, and further finding suitable features for color marks and stains in defect detection will be the main focus in the future works.
35.1346 0.2334
35.1521 0.2584
36.5088 0.324
35.2762 0.2793
36.6593 0.3322
29.4867 0.4412
33.0235 0.3396
33.36
35.9307 0.2934
35.7771 0.2812
35.6993 0.2748
31.2211 0.4331
35.1598 0.2706
35.5754 0.2874
34.8448 0.2927
33.8108 0.2994
37.0187 0.3873
35.3931 0.273
34.0609 0.2991
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0.333
35.5073 0.296
1
0.2124 0.2124
0.1923 0.1923
0.2238 0.2238
0.2346 0.2346
0.1702 0.1702
0.1901 0.1901
0.2015 0.2015
0.2567 0.2567
0.1935 0.1935
0.2028 0.2028
0.2009 0.2009
0.2453 0.2453
0.2647 0.2647
0.3078 0.3078
0.2116 0.2116
0.2093 0.2093
0.2058 0.2058
0.2041 0.2041
0.2246 0.2246
0.1841 0.1841
2.0289
1.9464
2.9612
1.9415
1.8497
2.0454
2.3351
2.2721
1.9952
2.2498
2.1249
2.2489
2.2862
2.8733
2.3827
2.1482
2.4576
1.9137
1.8251
2.0396
0.2898 0.2321 0.2858 0.4084 0.3165 0.2999 0.2502 0.2376 0.2301 0.4056 0.2219 0.24 0.2385 0.2675 0.3346 0.2241 0.2574
−0.2995 −0.4234 −0.7108 −0.7066 −0.3838 −0.4231 −0.4077 −0.5544 −0.5503 −0.4362 −0.5008 −0.6251 −0.3141 −0.4596 −0.6391
−0.4745 −0.4916
0.2154
−0.4727 −0.3123
0.2427 0.1962
−0.4368
0.5243
0.5917
0.4148
0.4976
0.5709
0.5684
0.5956
0.2723
0.5822
0.5726
0.5513
0.4456
0.4108
0.2425
0.4897
0.5764
0.4856
0.6012
0.6256
0.5646
1.0819
1.0327
1.3813
1.1118
1.0557
1.0788
1.0182
1.4893
1.048
1.0723
1.1144
1.2002
1.2388
1.5843
1.2262
1.0645
1.2481
0.9965
0.9403
1.0938
0.7991
0.8855
0.8861
0.8776 0.8755
0.8956 0.8928
0.8411 0.8379
0.8713 0.8695
0.839
0.8872 0.8846
0.8966 0.8939
0.8015 0.8
0.8918 0.8893
0.838
0.8817 0.8792
0.8553 0.8534
0.8454 0.8441
0.801
0.8645 0.8618
0.8914 0.8837
0.8633 0.8603
0.8988 0.8965
0.9077 0.9056
0.8869 0.884
9.9605
9.9638
9.9795
9.4896
9.4227
8.8982
9.1197 9.9607 9.8783 9.5812
0.7079
9.6583
0.7633 10.0202
0.6222 10.3479
0.6851
0.747
0.7469 10.0761
0.7655
0.4115
0.7567 10.0976
0.7499 10.1009
0.7345 10.1495
0.6386
0.6029
0.327
0.6862 10.2939
0.7525
0.6839 10.271
0.769
0.7856
0.7444 10.06
Sl. Contrast Correlation Energy Entropy Dissimilarity Contrast Correlation Homogenity Auto Cluster Maximum Sum Standard No. inverse Correlation shade probability of devision difference square DCT
Table 1 Sample feature values of textile images
180 S. Soma and H. Pooja
16 Machine Learning System for Textile Fabric Defect Detection … Fig. 5 Graphical representation of overall accuracy using SVM and NN classifiers
181
RECOGNITION RATE (%) 100
95
95 90
Accuracy
85
85 80
SVM
NN
Table 2 Performance evaluation of SVM and NN classifier when trained and tested with different fabric samples Dataset size: 1000 images Classifiers
Train
Test
Correctly identified
Incorrectly identified
Accuracy %
SVM
180
820
697
123
85%
NN
180
820
780
40
95%
Table 3 Performance comparison of the classifiers in terms of accuracy, sensitivity, specificity, precision, and recall Model
Accuracy %
Sensitivity %
Specificity %
SVM NN
Precision %
Recall %
85
100
95
100
76.92
70
100
90.90
90
100
Table 4 Performance comparison of the classifiers in terms of F-measure and AUC Model
F-Measure
AUC
SVM
0.92
1.2692
NN
0.97
1.4091
Fig. 6 Graphical representation of performance measure for models NN and SVM
182
S. Soma and H. Pooja
References 1. S.R. Kurkute, P.S. Sonar, S.A. Shevgekar, D.B. Gosavi, DIP based automatic fabric fault detection. Int. Res. J. Eng. Technol. 4(4), 3356–3360 (2017) 2. W. Wang, B. Xin, N. Deng, J. Li, Objective Evaluation on Yarn Hairiness Detection Based Multi-View Imaging and Processing Method (Elsevier, 2019), vol. 148, pp. 1–8 3. K. Yıldız, Z. Yıldız, O. Demir, A. Buldu, Determination of yarn twist using image processing techniques. in International Conference on Image Processing, Production and Computer Science (ICIPCS'2015) Istanbul (Turkey), ICIPCS (2015), pp. 83–88 4. V. Kaplan, N.Y. Varan, M. Dayik, Y. Turhan, G. Durur, Detection of warp elongation in satin woven cotton fabrics using image processing. Fibres Amp. Text. Estern Europe 24(4), 62 (2016) 5. K. Hanbay, M.F. Talu, O.F. Ozguven, Fabric defect detection system and methods ‘A systematic literature review’. Int. J. Light Electron Optics 127(24), 11960–11973 (2016) 6. M. Jmali, F. Sakli, B. Zitouni, Fabrics defects detecting using image processing and neural networks & quot. IEEE Int. Conf. Appl. Res. Text. 4, 1–4 (2014) 7. R.D. Karnik, L.S. Admuthe, Density evaluation and weave pattern classification of fabric using image processing. Int. J. Eng. Res. Technol. 10, 452–456 (2017) 8. P. Singh, P. Singh, Texture analysis in fabric material for quality evaluation using GLCM matrix. Int. J. Appl. Eng. Technol. 5, 1–5 (2015) 9. G.V. Vasant, M.D. Patil, Cotton contaminants automatic identification techniques. Int. J. Emerg. Technol. Innovative Eng. 1(5), 63–66 (2015) 10. M. Garg, G. Dhiman, Deep convolution neural network approach for defect inspection of textured surface. J. Inst. Electron. Comput. (2), 28–38 (2020) 11. P. Anandan, R.S. Sabeenian, Fabric defect detection using discrete curvelet transform. Int. Conf. Robot. Smart Manuf. (RoSMa2018) 133, 1056–1065 (2018) 12. S. Mei, Y. Wang, G. Wen, Automatic fabric defect detection with a multi scale convolutional denoising auto encoder network model. MDPI J. 18(4), 1–18 (2018) 13. L. Weninger, M. Kopaczka, D. Merhof, Defect detection in plain weave fabrics by yarn tracking and fully convolutional network. In IEEE 2018 International instrumentation and measurement technology conference (12MTL), IEEE (2018), pp.1–6 14. A. Hamdi, M.S. Sayed, M.M. Fouad, M.M. Hadhoud, Unsupervised patterned fabric defect detection using texture filtering and K-means clustering. in 2018 International conference on innovative trends in computer engineering, ITCE (2018), pp.130–144 15. S.L. Bangare, N.B. Dhawas, V.S. Taware, S.K. Dighe, P.S. Bagmare, Implementation of fabric fault detection system using image processing. Int. J. Res. Advent Technol. 5(6), 115–119 (2017) 16. Y. Li, W. Zhao, J. Pan, Deformable patterned fabric defect detection with fisher criterion-based deep learning. IEEE Trans. Autom. Sci. Engi. 14(2), 1256–1264 (2017) 17. L. Tong, W.K. Wong, C.K. Kwong, Fabric defect detection for apparel industry anonlocal sparse representation approach. IEEE 5, 5947–5964 (2017) 18. H. Abdellah, R. Ahmed, O. Slimane, Defect detection and identification in textile fabric by SVM method. IOSR J. Eng. (IOSRJEN) 04(12), 69–77 (2014) 19. R.S. Mahagaonkar, S. Soma, A novel texture based skin melanoma detection using color GLCM and CS-LBP feature. Int. J. Comput. Appl. 171(5), 1–5 (2017) 20. G. Singh, G. Singh, M. Kaur, Performance evaluation of fabric defect detection using series of image processing algorithm and ANN operation. Int. J. Recent Trends Eng. Res. 2(5), 1–7 (2016) 21. Z. Xiaowei, F. Xiujuan, Fabric defect detection based on GLCM approach. in 6th International Conference on Information Engineering for Mechanics and Materials, ICIMM (2016), pp. 673– 677
Chapter 17
Alzheimer’s Disease Prediction via Optimized Deep Learning Framework G. Stalin Babu, S. N. Tirumala Rao, and R. Rajeswara Rao
1 Introduction Alzheimer’s disease is a neurodegenerative disorder of the brain that affects brain cells, neurotransmitters, brain memory cells, and the nervous system finally inducing dementia in elderly people [1]. As per the report released by Alzheimer’s disease international, at present, 4.4 crores people suffering from dementia and it is estimating that it may be increased to 7.6 crores by 2030 and 13.5 crores by 2050. Alzheimer’s disease patients account for 50–75% of these patients [2], with symptoms ranging from mild to serious. Mild cognitive impairment (MCI) is specified in which a person’s critical thinking gets worse progressively but is detectable. Since no treatment persists to cure Alzheimer’s disease, some remedies have been introduced to prolong the advancement of some symptoms and minimize the emotional effect on patients such as loss of memory and delusion, so that it is very essential to develop new techniques to recognize AD patients accurately and effectively. Early detection of Alzheimer’s disease is important for establishing effective treatments to control the progression of the disease. Nevertheless, effective diagnosis and observation necessarily require extra resources like neuroimaging techniques. Magnetic resonance imaging (MRI) has gained significance in diagnosing Alzheimer’s disease and determining MCI to AD migration. Still now, so many papers are published on Alzheimer’s disease prediction and utilize a low amount of G. Stalin Babu (B) Department of CSE, JNTUK, Kakinada, India Department of CSE, Aditya Institute of Technology and Management, Tekkali, India S. N. T. Rao Department of CSE, Narasaraopeta Engineering College, Narasaraopeta, India R. R. Rao Department of CSE, JNTUK University College of Engineering, Vizianagram, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_17
183
184
G. Stalin Babu et al.
data and inappropriate [3]. The key issues, there is no standardized parameters for evaluating, whether the MR image impairment is subject to external growth or can be identified as pathological. Deep learning is one of the most popular used in biomedical applications associated with high dimensional data. Deep learning approaches are well suited to address the challenges of early detection of AD, because they can identify the hidden patterns in the data and construct an effective prediction model. The major contribution of this article is. Deep convolutional neural network (DCNN) model is developed where weights and activation function optimized via grasshopper optimization algorithm (GOA).
2 Literature Review This section provides systematic reviews of various models are developed by researches to the prediction of AD in early stages. Lee et al. [4] applied a multimodal deep learning to detect AD in an earlier stage. The study involves the analysis of mild cognitive impairment (MIC) and normal older adults. They said that MCI is an intermediate stage between Alzheimer’s disease and normal older adults. They applied an integrated framework of longitudinal multidomain data to predict conversion from MCI to AD. The accuracy observed was 75% using single modality and 81% for longitudinal multi-domain data, respectively. They also suggested that the multimodal deep learning approach is the best way to identify persons at risk. Spasov et al. [5] applied deep learning algorithms to recognize the MCI patients who are at high risk based on 3D separable convolution and dual learning. They classify MCI patients into two categories. They applied demographic, magnetic resonance imaging (MRI), APOe4, and neuropsychological data as an input to deep learning methods. They applied a multitasking deep learning model that predicts both AD vs healthy controls and MCI to AD conversion. They achieve an accuracy of 86%, a specificity of 85%, and a sensitivity of 87.5%. Qiu et al. [6] used the ADNI dataset. They applied deep learning and convolutional network to diagnose AD. They classify signatures based on MRI data and nonimaging data (gender, MMSE score, and age). They studied the comparative analysis of three independent cohorts; they are Australian imaging, biomarker and lifestyle (AIBL) flagship study of aging, National Alzheimer’s Coordinating Center (NACC), Framingham Heart Study. Roy et al. [7] considered the open access series of imaging studies (OASIS) dataset and MRI images as an input to CNN for detection and recognition of Alzheimer’s disease. They achieve an accuracy of 80%. Lin et al. [8] used deep learning and convolutional neural networks (CNNs) to identify the conversion of MCI- to-AD. They consider the MRI data from ANDI dataset for prediction. They used NC patches from AD to train the CNN. To predict
17 Alzheimer’s Disease Prediction via Optimized Deep …
185
the conversion of MCI- to-AD, they applied an extreme learning classifier. They achieve an accuracy of 86.1% with AUC. Feng et al. [9] studied Alzheimer’s disease (AD) and said that it is a kind of neuropathological disorder. To diagnostic AD, they applied novel brain imaging methods and deep learning techniques. They consider the ADNI dataset and applied MRI with 3D-CNN to classify the disease model. They compare the performance of 2D-CNN, 3D-CNN with support vector machine (SVM), and 3D-CNN models. They suggested that 3D-CNN-SVM is easier to screen AD. Venugopalan et al. [10] applied deep learning to analyze magnetic resonance imaging (MRI), clinical test data, genetic single nucleotide polymorphisms (SNPs) to classify AD. They considered the ANDI dataset. They analyze different stages of AD based on the fusion of multiple data modalities. They applied 3D-CNN for imaging data and auto-encoders to extract features from genetic and clinical data. Top performing features can be identified based on novel data interpretation techniques. They classify and compare the AD staging based on the decision trees, random forests, support vector machines, and k-nearest neighbor’s methods. They identified the top distinguished features of AD. The above works have used a variety of classifiers and they get accurate results, even though predict the early stage of AD is still demanding more novel deep learning techniques for better accuracy.
3 Proposed Framework 3.1 Optimized DCNN DCNN [11] includes three diverse layers like “convolution layer, pooling layer, and fully connected layers.” In general, convolution kernels assist in computing varied feature maps. The feature value Z kp,q,r is evaluated at location in the kth kernel of rth feature map is shown in Eq. (1). k + Brk Z kp,q,r = WrkT V p,q
(1)
where Wrk denotes as weight vector, which is optimally tuned via GOA, Brk is bias k is a patched input at centered location ( p , q) of kth layer term of rth filter, and V p,q of CNN. Consequently, the activation function helps in finding the nonlinear features of multilayer networks. In addition, the activation function act(·) is optimally chosen by means of GOA algorithm and the activation value actkp,q,r associated to convolution features Z kp,q,r is computed as in Eq. (2). actkp,q,r = act Z kp,q,r
(2)
186
G. Stalin Babu et al.
Pooling layer: “Pooling layers in the DCNN perform the down sampling operations with the outcomes acquired from the convolutional layers.” For each pooling k is computed as in Eq. (3). function, pool(·) related to actkp,q,r the value of U p,q,r k U p,q,r = pool actkp,q,r , ∀( p, q) ∈ R p,q
(3)
The output layer is final layer in CNN, and the classification occurs here. CNN loss is denoted by L and it is computed as per Eq. (4). L=
N 1 k θ ; U (n) , O (n) N n=1
(4)
k by θ . There The entire parameter related including Wrk and r is designated is a B (n) N count of input–output relations are present. V , U (n) ; n ∈ [1, · · · , N ] . The nth input data, the corresponding target labels, and the CNN output are signified by V (n) , U (n) , and O (n) in that order.
3.2 Grasshopper Optimization Algorithm (GOA) The “grasshopper optimization algorithm (GOA)” is a swarm intelligent algorithm implemented by Mirjalili et al. [12]. To find global optimum values of an optimization problem depends on social interaction forces, the mathematical form of grasshopper is defined as per Eq. (5). X m = Sm + G m + Am
(5)
where X m is position of mth grasshopper, Sm is social interaction, G m is gravity force of mth grasshopper, and Am is wind advection. The random behavior of Eq. (5) is given as X m = r1 Sm + r2 G m + r3 Am where the random numbers are r1 , r2 , and r3 between 0 and 1. The Sm is social interaction and is defined as per Eq. (6). Sm =
N
n=1 m = n
s(dmn )dˆmn
(6)
17 Alzheimer’s Disease Prediction via Optimized Deep …
187
where dmn is distance between mth and nth grasshopper, it is defined as dm = m |xn − xm | and dˆmn = xnd−x , and it is unit vector from nth to mth grasshopper. The s mn is social force function and is given as in Eq. (7). s(r ) = f e
−r l
− e−r
(7)
Here, f is denoted as intensity of attraction and l is denoted as attractive length scale. When the distance of two grasshoppers is grater than ten, the social force function(s) is fail to overcome the problem and the grasshopper distance mapped to interval [1, 4]. The G m gravity force is defined as in Eq. (8). G m = −g eˆg
(8)
where g is gravity constant and eˆg is unity vector to center of earth. The Am wind advection is defined as in Eq. (9). Am = Z eˆw
(9)
where Z is constant drift and eˆ w is unity vector in direction wind. Substitute the Sm , G m , Am in Eq. (5) is to become as Eq. (10). Xm =
N
s(|xn − xm |)
n=1 m = n
xn − xm − g eˆg + Z eˆw dmn
(10)
The altered version of Eq. (10) to solve optimization problems as in Eq. (11). ⎞
⎛
⎟ ⎜ ⎟ ⎜ N ⎟ ⎜
ubi − lbi i i xn − xm ⎟ + Tˆi x s − x X mi = p ⎜ c n m ⎜ 2 dmn ⎟ ⎟ ⎜ ⎠ ⎝n=1 n = m
(11)
where ubi is upper bound, lbi is low bound in ith direction, and Tˆi is a target. Here, coefficient p is decrease proportional to number of iteration to balances the exploration and exploitation. The coefficient p is defined as in Eq. (12). p = pmax − l
pmax − pmin L
(12)
188
G. Stalin Babu et al. Start
Initialize n Population Xi (1: n),CMax, CMin and itMax Calculate Fitness value of each Grasshopper Choose Best Solution and Set as Target (T) it:=0
No
it > itMax Yes
Calculate Value of p as per Eq .(12)
No
i 3, THDIL > 25, VL < 220, IL > 180, VL line > 445, WL > 40, VAL > 40, RVAL > 7, KW > 115, KVAR > 17, ATI > 40, OTI > 65, OLI < 39, OTI A > 0, OTI T > 0, MOG A > 0, and INUT > 65. Transformer faults: From the available data, the distribution transformer is starconnected secondary imbalance load. The three-line currents in the secondary become unequal, and this causes imbalance in the transformer parameters effected due to imbalance. The following are the factors: • oil temperature (increases) • total current and voltage harmonic distortions (increases) • unequal potential drop in three lines (the line carrying more current will have more potential drop). Power factor: The effects are as follows: • • • •
reactive power (increases with decrease in power factor and vice versa) apparent power (increases with decrease in power factor) current (increases with decrease in power factor) temperature (increases with decrease in power factor).
Line to line fault: The currents in the two lines increase abnormally, and it increases the oil temperature. Frequency: Under frequency causes increase in stress on the insulation (increases the induced voltage), and over frequency causes decrease in induced voltage in transformer. Overvoltage fault: happens due to the surges and under frequency. Inter-turn fault: happens due to the overvoltage and the increase in oil temperature. Oil level decreases: due to increase in oil temperature or oil leakage.
0.835
− 0.250
0.976
− 0.578
VL31
INUT
0.942
0.944
VL23
0.943
− 0.434
− 0.568
IL3
0.950
− 0.378
− 0.710
IL2
VL12
0.859
− 0.370
0.940
− 0.676
1.000
0.809
IL1
0.809
VL2
VL3
1.000
VL2
VL1
VL1
Parameter
− 0.442
0.983
0.972
0.964
− 0.621
− 0.617
− 0.609
1.000
0.859
0.940
VL3
Table 5 Correlation matrix of current voltage
− 0.378 − 0.617
− 0.370 − 0.609
− 0.658
− 0.642 0.810
− 0.561
− 0.545 0.903
0.791 − 0.578
0.710
1.000
− 0.552
0.942
0.942
− 0.710
− 0.676
1.000
IL2
IL1
0.359
− 0.587
− 0.559
− 0.554
1.000
0.791
0.710
− 0.621
− 0.434
− 0.568
IL3
− 0.420
0.961
0.995
1.000
− 0.554
− 0.578
− 0.552
0.964
0.943
0.950
VL12
− 0.403
0.962
1.000
0.995
− 0.559
− 0.561
− 0.545
0.972
0.942
0.944
VL23
− 0.513
1.000
0.962
0.961
− 0.587
− 0.658
− 0.642
0.983
0.835
0.976
VL31
1.000
− 0.513
− 0.403
− 0.420
0.359
0.810
0.903
− 0.442
− 0.250
− 0.578
INUT
21 Transformer Data Analysis for Predictive Maintenance 223
224 Table 6 Probability of transformer failure for a given threshold
S. R. Putchala et al. Threshold
Trip = False
Trip = True
Oil temperature < 65
1,169,491
0
Oil temperature > 65
1114
17,830
The main variable that determines failure prediction in transformers is OTI T (Oil Temperature Indicator Trip) which can be modeled by using logistic regression as the variable to be predicted is binary (Table 6). This model is calculated by the probability function: P(x) =
M 1 + eb(x0 −x)
where M is the maximum value, b is the steepness of the curve, x 0 is the midpoint of the sigmoid curve, and x is a real number.
4.3 Testing for Stationarity A stationary time series means that the time series consistently fluctuates between approximately the same values and are more predictable [4]. The plotting of the data set can be seen in Fig. 2 where the time series is stable, and the same values keep iterating over time. A root test can be done to prove if the data set is stationary. This is to calculate the autocorrelation function for the series. We first analyze the normal Dickey–Fuller test. This step is to analyze stationarity and properties of the data set. To analyze a root test correctly, we compare our calculated values to the critical values of the augmented Dickey–Fuller test. By observing the critical values in Table 7, we can
Fig. 2 Overview of the load current time series
21 Transformer Data Analysis for Predictive Maintenance Table 7 Results of Dickey–Fuller test for kilowatt-hour consumption
225
Test statistic
3.671
p-value
1.000
Lags used Critical value (1%)
28.000 − 3.431
Critical value (5%)
− 2.862
Critical value (10%)
− 2.567
see that our calculated value is not lower than the critical values which means that our root test values are not lower than the critical values, so we cannot reject the null hypothesis and the time series is non-stationary. Holt-Winters Models: The Holt-Winters is used in the presence of both trend and season ality. Common trends are linear, exponential, logarithmic, square, and n degree polynomials which can be easily modeled using mathematical functions, such as log(x), linear, x 2 , and exp(x). We have used additive–additive and additive–multiplicative seasonal method [5]. Recurrent Neural Networks: Recurrent neural networks are the feed-forward neural networks that take the previous outputs as inputs while having hidden states. They are represented as a = g1 Waa a + Wax x + ba
(1)
y = g2 W ya a + b y
(2)
where W ax , W aa , W ya , ba , and by are coefficients that are shared temporally and g1 , g2 activation functions [6]. We used one layer with ten neurons each with a learning rate of 0.02 and trained with 50 epochs. Long Short-Term Memory: LSTM networks can classify, process, and forecast time series data since there can be lags of unknown duration between important events in a time series. We used four layers with 100 neurons each with a dropout of 0.2 and trained with 100 epochs. Training the Model: As the time series stretches over six months, only the data from the previous week will be used for training, one day will be used for validation, and values for the next 24 h will be forecasted. The training data will be the fitting of the model. Model Selection and Fitting of the SARIMA Model: Since the autoregression (AR), moving average (MA), and ARMA can only work on stationary data, we chose the seasonal-ARIMA (SARIMA) model, which is an extension of ARIMA, as it takes both seasonality and trend into consideration for the forecast. It is represented by SARIMA (p, d, q) (P, D, Q)m where p, d, and q denote the trend order and P, D, Q, and
226
S. R. Putchala et al.
Fig. 3 ACF and PACF values for current lines
m denote the seasonal order. Box–Jenkins method is used to fit an ARMA/ARIMA model to the series where we calculate and plot the autocorrelation function (ACF) and partial ACF (PACF) for the series [7]. In Fig. 3, we observe cycles with even intervals, so the series has seasonality and a gradual decline. Therefore, it could be stationary. The number of points in each cycle is 24, which means that we have a daily seasonality. If the trend is linear, we can remove the trend with a first-level differentiation to find the parameters for the seasonal order and do the Box–Jenkins method again after plotting the new ACF values, to fit our seasonal values for the SARIMA model. Differencing is performed by subtracting the previous observation from the current observation, shown in Eq. 3, where t is time. difference(t) = value(t) − value(t − 1)
(3)
We also experimented with a grid search to get the most fitting parameters for the SARIMA forecasting models by calculating the Akaike information criterion (AIC) to define a set of model combinations to test which forecast model has the lowest error regarding our data set. To evaluate which model is the best fitted, we assess based on multiple one-step forecasts, compare with the actual value, and calculate the root mean squared error (RMSE), mean average error (MAE), and mean average percentage error (MAPE) for evaluating the model. RMSE =
N i=1
xi − x i N
2
RMSE = root mean square error, i = variable i, N = total number of data points, x i = true value, and x i = prediction n MAE =
i=1
xi − x i n
MAE = mean absolute error, x i = prediction, x i = true value, and n = total number of data points
21 Transformer Data Analysis for Predictive Maintenance
MAPE =
227
n 1 xt − yt n t=1 xt
MAPE = mean absolute percentage error, n = number of times the summation iteration happens, x t = actual value, and yt = forecast value. The model that the grid search found out to be the best model is the SARIMA(3, 0, 0) (2, 1, 0, 24). We will with this mention them by model which is a constant and linear trend, linear, and no trend with the last seasonal parameter being 24, as we could see in the data set that there was daily seasonality. The results of SARIMA model for phase voltage line 1 (VL1), kilowatt (KW), current load (IL1), and total harmonic distortion voltage line (THDVL1) are shown in Figs. 4, 5, 6 and 7. Since this seasonality is daily, the observations amount to 365 per year, which requires at least 3 years of data to assess various components of the time series. For longer forecasts, the common practice is to use monthly or quarterly data. The forecast of the load current (IL1) is being displayed in Fig. 8, along with a message which indicates if a fault will occur or not in future. Fig. 4 Prediction of phase voltage line 1
Fig. 5 Prediction of kilowatt
228 Fig. 6 Prediction of current load
Fig. 7 Total harmonic distortion voltage line
Fig. 8 Final result of the forecast
S. R. Putchala et al.
21 Transformer Data Analysis for Predictive Maintenance
229
4.4 Evaluation of Results Our result analysis is done by comparison on the basis of metrics: RMSE, MAE, and MAPE of different time series models: SARIMA, RNN, LSTM, and Holt-Winter’s Smoothing. Even though the SARIMA forecasting results for most of the parameters are better than other results, our overall results have a higher error than anticipated, since several authors of related field have achieved MAPEs less than 2%. The mean average percentage error for SARIMA of all the parameters was in between 0.011 and 7.9%, and the error for the RNN was ranging from 0.85 to 17.1889%. For MPD, we got the minimum RMSE, MAE, and MAPE as 0.027, 0.022, and 3.221%, respectively, while, for OTI, we got the minimum RMSE, MAE, and MAPE as 0.069, 0.047, and 0.367%, respectively, which were achieved by RNN and LSTM. The optimum values for all the mentioned evaluation metrics are represented in bold in Table 8. From the confusion matrix of logistic regression model in Table 9, we can observe that 237,499 values are predicted correctly, while only 188 values are incorrect classifications. This model will be further combined with the results of time series models to project failures in future, as seen in Fig. 8. Table 8 Evaluation of models on various parameters Parameter
Model
RMSE
MAE
MAPE (in %)
MPD MPD
SARIMA
0.027
0.022
3.221
RNN
0.143
0.058
8.564
MPD
LSTM
0.060
0.074
8.554
MPD
Holt-Winter’s
0.041
0.040
5.797
KWH
SARIMA
0.005
0.003
0.011
KWH
RNN
16.721
0.262
0.851
KWH
LSTM
0.144
0.143
0.459
KWH
Holt-Winter’s
0.028
0.025
0.092
IL1
SARIMA
0.104
0.092
7.874
IL1
RNN
0.707
0.045
17.188
IL1
LSTM
0.229
0.193
18.034
IL1
Holt-Winter’s
0.162
0.128
8.913
OTI
SARIMA
0.104
0.053
2.884
OTI
RNN
2.272
0.047
0.367
OTI
LSTM
0.069
0.616
16.510
OTI
Holt-Winter’s
0.150
0.108
10.149
230 Table 9 Confusion matrix of logistic regression model for fault prediction
S. R. Putchala et al. Predicted: no fault Actual: no fault Actual: fault
Predicted: no fault
234,099
46
142
3400
5 Conclusion We analyzed our data set with the SARIMA time series, recurrent neural networks, long short-term memory, Holt-Winter’s Exponential Smoothing, and modeled fault prediction by using logistic regression. From the results, the SARIMA model has outperformed most of the models. The models we have implemented so far can work best for one-day forecasting (24 h) for the available data set. For future scope, we can attempt to refine our models and explore other techniques for fitting models like hybrid models and long-term forecasting. Long-term forecasting can be done if provided with at least two years of data using different methods or extensions to the SARIMA model or finding a better method for fitting the model. This could also help us in achieving better accuracy for monthly forecasts for the prominent parameters. Another possibility is ensembling existing models with other machine learning models.
References 1. J. Singh, S. Singh, Transformer failure analysis: reasons and methods. International Journal of Engineering Research & Technology (IJERT) 4(15) (2016) 2. E.H. Ko, T. Dokic, M. Kezunovic, Prediction model for the distribution transformer failure using correlation of weather data. in 5th International Colloquium on Transformer Research and Asset Management, pp. 135–144 (2020) 3. M.C.M.L.J. Soares, Modeling and forecasting short-term electricityload: a comparison of methods with an application to Brazilian data. Int. J. Forecast. 24, 630–644 (2008) 4. J. Van Greunen, A. Heymans, C. Van Heerden, G. van Vuuren, The prominence of stationarity in time series forecasting. J. Stud. Econ. Econometrics 38(1), 1–16 (2014) 5. M. Heydari, H.B. Ghadim, M. Rashidi, M. Noori, Application of Holt-Winters time series models for predicting climatic parameters (Case Study: Robat Garah-Bil Station, Iran). Pol. J. Environ. Stud. 29(1), 617–627 (2020) 6. A. Sherstinsky, Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica D Nonlinear Phenom. 404. Elsevier journal (March 2020) 7. S. Makridakis, M. Hibon, ARMA models and the Box–Jenkins methodology. J. Forecasting 6(3), 147-163 (1997)
Chapter 22
Synthetic Face Image Generation Using Deep Learning C. Sireesha, P. Sai Venunath, and N. Sri Surya
1 Introduction In today’s world, the media industry is all about, “How do you attract viewers?” People in the industry often attract viewers with eye-catching posters. They grab their viewers’ attention by designing creative posters. However, designing good posters is often a hectic task. The people designing these posters always want their posters to be as attractive and appealing as possible. In order to make the poster attractive, they hunt for several good-looking faces and ensure that none of these faces cause a breach in any privacy issues. In today’s world, humans always want to use technology to solve their day-today problems, and with a significant rise in deep learning concepts, designers have developed the technology for generating artificially generated images for their poster making. For generating images artificially, the idea of dealing with the concepts of GAN is encouraged. In this paper, we have used the DCGAN model for generating images that do not exist in the real world. Using these artificially generated face images, the problem of dealing with privacy and legal issues does not arise, and the output will be much more effective. Besides making posters, this paper also creates sketches for those generated images which help art directors or cartoonists. DCGANs have two networks which are (1) (2)
Generator network Discriminator network.
The task of the generator is to make images that look authentic, while the task of the discriminator is to tell the difference between actual and fake photographs. When
C. Sireesha · P. Sai Venunath (B) · N. Sri Surya Vasavi College of Engineering, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_22
231
232
C. Sireesha et al.
both the models are working at their best, the result is a realistic generated face that does not exist in the real world. In this paper, we are creating our model based on DCGAN concepts and we test this model on CelebA dataset.
2 Literature Survey There are different variants of GANS used for image generation. StackGan [1, 2] that practice the usage of generator and discriminator in its implementation. cGAN [3, 4], which when compared to other approaches like layered and iterative methods, is quite simple to create and apply and produces good results in general. The most commonly explored aspect of GAN is image generation. SS-GAN [5], in which the hierarchical, iterative, and direct techniques are the main methods used by GAN in image production. Two generators and two discriminators are used in the hierarchical approach algorithm. Each generator and each discriminator are used for specific functioning, and they are connected in parallel or in a series structure. CNN [6], deployed nine pre-trained CNN models to explore the transfer learning techniques and conclude that fine-tuning the pre-trained CNN models can be successfully deployed to a limited class dataset. SN-GAN [7], proposed a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. A normalization technique is computationally light and easy to incorporate into existing implementations. In that paper, they tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and they experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.
3 Methodology This paper attempts to create a generator model that can consistently produce images of people that are nearly realistic in quality and diverseness. In this methodology, we create generator and discriminator models. The generator model, G, takes noise z as input and generates three channel images that are similar to our real training data. The discriminator model, D, tries to distinguish between the real training data and the fake samples sent in by the generator. The discriminator then sends feedback based on how real or fake (probability), the given images are back to the generator. The goal of the model is to keep ensuring that the generator and discriminator perform on equal levels (Fig. 1).
22 Synthetic Face Image Generation Using Deep Learning
233
Fig. 1 Architectural diagram
3.1 Data Acquisition We import the preprocessed CelebA dataset from Kaggle. The dataset consists of over 200,000 face images of various celebrities and has 40 different binary attribute annotations per image. Our requirement is only the facial features per picture, so we download the preprocessed version and unzip the file. We then send the images to a directory.
3.2 Loading Dataset into Data Loader We then create a data loader, to access the images from the dataset in batches. We have to then initialize two data loader hyper-parameters which are crucial in how our model performs. These hyper-parameters are: • Batch_size: A larger batch_size results in the model training faster but with a heavier computational requirement. So we found that a batch_size of 64 after some testing was ideal to get decent results. • Image_size: The image size we chose for our model was 32. A larger image size demanded a deeper model, and both the outputs generated and the training time required were suboptimal. Resizing the data to smaller sizes improves the training speed, while still creating near realistic images of faces.
234
C. Sireesha et al.
3.3 Scaling and Further Preprocessing the Data Then, we have to do a preprocessing on the training samples and rescale the images in it from a range of (0, 1) to a range of (− 1, 1). We are doing this because tanh generator accepts images whose pixel values lie between − 1 and 1. Here, in scaling, the image X is taken and returned as scaled image with feature range. The scaled image will be in the range from − 1 to 1.
3.4 Implementing the Model Generator (G) The generator’s main role is to learn how to make a realistic representation of a class. Generator takes in random noise as input and produces fake images of size 32 × 32 × 3. Our generator model mostly consists of transpose convolutional layers with batch normalization applied to the outputs. The first layer receives random noise signal vector and is multiplied by a weight matrix into a 3D tensor image. The remaining layers are transposed convolutional layers, and 2 × 2 strides are used every layer. ReLU activation function is used. The last layer uses a tanh activation function to generate 32 × 32 × 3 size images. The implementation model results are shown in Fig. 2. The functionality is in Fig. 3. Discriminator (D) The discriminator’s main role is to distinguish between classes. Discriminator takes a set of features and determines a category based on the given features. Our discriminator model is a convolutional classifier with no maxpooling layers. It makes use of batch normalization to deal with complex data.
Fig. 2 Generator model
22 Synthetic Face Image Generation Using Deep Learning
235
Fig. 3 Training the generator
• We use two convolutions per layer in the discriminator. • At each layer, sequential linear layers are used. Thus, we have a stride of 2 every layer. • Leaky ReLU activation function is used. • Discriminator receives 32 × 32 × 3 image as input. After a series of convolutional layers, it converts the tensor into a loss function value using sigmoid function. This output is then sent as feedback back to the generator to improve the image quality produced by Generator. The implementation model results are shown in Fig. 4. The functionality is shown in Fig. 5.
Fig. 4 Discriminator model
Fig. 5 Training the discriminator
236
C. Sireesha et al.
3.5 Check for GPU Availability It is essential to check whether the model can be trained on a GPU or not due to its computational requirements. To check for this, we use a Boolean variable train_on_gpu and check whether CUDA is available or not. It is also essential to ensure that our: • Models • Model inputs • Loss function arguments.
3.6 Defining Our Loss Functions for Both Models Generator Losses • First create noise vectors and generate a batch of fake images. This is stored in a variable fake_image. • Then get the discriminator’s prediction of the fake images. • Finally, calculate the generator losses using BCEWithLogits as our loss function criterion. Discriminator Losses • First create noise vectors and generate a batch of fake images. This is stored in fake images. • It then gets the discriminator’s prediction of the fake image and calculates the loss using BCEWithLogits as our loss function criterion. • Then get the discriminator’s prediction of the real image and calculate its loss the same way. • Then calculate the total discriminator’s loss by averaging the real and fake image losses.
3.7 Set Optimizers for Both Models Define Adam optimizers [5] for both our generator and discriminator functions. Use a learning rate of 0.0002 as this was found to be the ideal learning rate in the original DCGAN paper [8].
22 Synthetic Face Image Generation Using Deep Learning
237
Fig. 6 Training of individual epochs
3.8 Training the GAN Training the GAN involves training the discriminator and generator one after the other. • We train the discriminator by giving real images, and then we will give a combination of real images (CelebA dataset) and fake images (generator images) • We train the generator based on the feedback given by the discriminator. • No images from CelebA dataset are used in training generator. We save these samples in a file—“training_samples.pkl” using pickle dump operation. Then, we set the number of epochs and we want to train our GAN [9] for and start running the computations. We have trained our model for 20 epochs as shown in Fig. 6.
3.9 Visualization of Training Losses Once we get the losses, we can plot them. We shall plot the training losses that are obtained after each and every epoch. Ideally, the goal of the generator is to ensure there is a very small difference between the two models, whereas the goal of the discriminator is to ensure the difference is large.
238
C. Sireesha et al.
Fig. 7 Visualization of losses
In Fig. 7, the orange curve (generator loss) is more fluctuating as we have given noise vectors as input to the generator.
4 Result Analysis The images mentioned below are the outcomes of the model. These images are realistic face images that do not exist in this real world. We have also developed front end where the user can have clear picture of our work (Figs. 8 and 9).
5 Conclusion In this paper, we have automated the development of realistic face images that do not exist in the real world by using deep convolutional generative adversarial neural network (DCGAN) concepts, so the designers can use these images without any copyrights or privacy issues.
22 Synthetic Face Image Generation Using Deep Learning
239
Fig. 8 Generated fake images
Fig. 9 Automatic realistic face generator
6 Future Scope In the future, we wish to extend the model to create faces whose expressions can be changed manually. We also wish to extend the model to create 3D objects and full-length videos that are artificially generated.
240
C. Sireesha et al.
References 1. F. Author, S. Author, Title of a proceedings paper, in Conference 2016, LNCS, vol. 9999, ed. by F. Editor, S. Editor (Springer, Heidelberg, 2016), pp. 1–13 2. M.A. Bottou, S. Chintala, Leon, Wasserstein generative adversarial networks, in Proceedings of the 34th International Conference on Machine Learning (2017) 3. F. Author, Contribution title, in 9th International Proceedings on Proceedings (Publisher, Location, 2010), pp. 1–2 4. T. Miyato, T. Kataoka, M. Koyama, Y. Yoshida, Spectral Normalization for Generative Adversarial Networks (CoRR, 2018) 5. H. Shi, J. Dong, W. Wang, Y. Qian, X. Zhang, SSGAN: secure steganography based on generative adversarial networks, in Advances in Multimedia Information Processing—PCM 2017 (2017) 6. S. Hira, A. Bai, S. Hira, An automatic approach based on CNN architecture to detect Covid-19 disease from chest X-ray images. Appl. Intell. 51, 2864–2889 (2021) 7. A. Bai, S. Hira, An intelligent hybrid deep belief network model for predicting students employability. Soft Comput. 25, 9241–9254 (2021) 8. H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, D.N. Metaxas, StackGAN: text to photorealistic image synthesis with stacked generative adversarial networks, in The IEEE International Conference on Computer Vision (ICCV) (2017) 9. I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, X. Bing, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems 27 (Curran Associates, Inc., 2014), pp. 2672–2680
Chapter 23
Machine Learning and Neural Network Models for Customer Churn Prediction in Banking and Telecom Sectors Ketaki Patil, Shivraj Patil, Riya Danve, and Ruchira Patil
1 Introduction Customer relationship management or CRM is a rapidly emerging field that combines computer science and business management in the form of data analytics. All the customer activities with the business organization are monitored using historical or real-time data. One such business requirement is the prediction of churn rate of any organization. Customer churning is the number of customers who leave a company’s service during a given time period. This churn rate is an unignorable deciding factor when it comes to the organization’s economics. If we find the root cause or clearly visible factors that affect the churn rate, efforts can be taken to work on those factors and try to maintain a profitable customer relationship. The customer churn rates by tenure for banking and telecom sector are represented in Figs. 1 and 2, respectively. When companies measure their customer turnover, they calculate the attrition based on some mathematical weighted matrices often called monthly recurring revenue (MRR). Before the widespread use of analytical software, this calculation was done manually. Around the 2000s, many intelligent business analyzing techniques emerged which could analyze the factors that most lead to customers leaving the company’s service based on the current data of interaction of the customer with the service. These techniques were based on various algorithms ranging from probabilities to basic classification models. With subsequent advancements in data prediction techniques, artificial neural networks came in to boost the performance of classification models. With respect to today’s scenario, classification models like SVM, random forest, and their hybrid versions are highly used for churn prediction in a wide range of companies. The current developments in this domain include the use of AI tools like fuzzy logic and ANNs to improve the accuracy and performance of the churn prediction models. K. Patil (B) · S. Patil · R. Danve · R. Patil MIT-World Peace University, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_23
241
242
K. Patil et al.
Fig. 1 Churning rate for bank dataset
Fig. 2 Churning rate for telco dataset
These models identify the likelihood of a customer to churn on the basis of their current and historic data. Results of a good quality churn model are highly helpful in making business decisions regarding the factors that affect the churn rate the most, thus helping the company to retain maximum customers and gain extra profits. A predictive churn model can help a range of companies like SaaS providers to identify the cause of the churn and take effective actions to reduce the churn rate. Our aim is to predict the churn rate for two organizations: the banking and telecom sector. The churn results for bank and telecom company customers are predicted using various classification models and ANNs for acquiring maximum accuracy.
23 Machine Learning and Neural Network Models for Customer …
243
2 Literature Survey Analyzing the people’s behavior and course of transactions with the company helps to understand if the customer will remain loyal to the organization in the long run. It is of utmost importance for any company or organization to avoid the churning factor. Many researchers have dedicated their time to analyze the churn rate and gain inferences through the available customer churning datasets. This section focuses on the previously done works in this field. Jain et al. [1] have focused on the churn prediction for the banking, telecom, and IT sectors. Their proposed system applies four machine learning algorithms, namely logistic regression, random forest, SVM, and XGBoost, for datasets of each domain and considers the respective highest accuracy model for future predictions. They also perform exploratory data analysis to identify churn retention strategy. Overall, the logistic regression model obtained the highest accuracy of all as 90.136%. Osowski et al. [2] have proposed a dynamic model that predicts customer behavior and possible churning using multilayer perceptron (MLP) and support vector machine (SVM). They concluded that the past 5 months’ observations can accurately predict the future churning decision of the corporate bank’s customers. Other tested classifiers like logistic regression, decision tree, and random forest obtained accuracies lower than MLP and SVM. Predicted churned customers using the MLP model were slightly inaccurate. Alkhatib et al. [3] have focused on a predictive model for cutting customer migration from banks. This model uses five machine learning classification models such as decision tree, random forest, naïve Bayes, k-nearest neighbor, and logistic regression. The evaluation of the model is done using the confusion matrix, from which it has been found out that the random forest algorithm gives the highest accuracy of 86%. Ahmad et al. [4] have focused on customer churn using big data platforms and machine learning. This model was prepared by transforming a large dataset of raw data into the required form. In this system, they used four different classification algorithms: decision tree, random forest, gradient boosted, and XGBoost. After evaluation, it was found that XGBoost gives the highest AUC value of 93.301%. Karvana et al. [5] have used data mining techniques for predicting the customer churning rate in the banking business sector. They have particularly used the CRISPDM methodology for their research work and stepwise explained the same. They concluded that their model’s results were highly dependent on the number of samples present in the dataset. The 50:50 sampled SVM obtained the highest profit of 456 billion. Hemalatha et al. [6] have proposed a system that predicts customer churning values for an Indian bank using the following models: SVM, KSVM, ANN, and a combined model of ANN and KSVM. The hybrid model built by combining ANN and K-SVM models showed a better accuracy of 97.98% for the churn prediction and even resulted in model improvement when examining the metrics with the other classifiers. Their system subsequently reaches 97% precision level.
244
K. Patil et al.
Kumar et al. [7] have proposed a solution to customer churn prediction using deep learning techniques like ANNs to predict the churn rate and to identify the most influencing factors that lead a customer to churn. They have performed implementations on a dataset from the telecommunication industry. The model of ANN used is the RELU activation function. This model helped them to achieve more than 80% accuracy, by essentially using the technique of varying some factors like epoch, size of each train and test batch, number of neurons, and using different activation functions. Ullah et al. [8] have focused on the random forest algorithm for classifying customers of a telecom industry as likely to churn out or not. They have tried the classification using various different models, but random forest provided the maximum accuracy of 88.63%. In addition to just identifying the customers that are likely to churn, a technique of cosine similarity was used to group the churn identified customers and offers were proposed to such groups to retain these customers. This paper also identified the attributes which were most associated with churning. After having studied some works in this field, it was observed that most of the research was limited to very few methodologies being tested for predicting the churn values. Some works were limited to one single methodology and had no scope for comparison. Some proposed systems could not generate reliable inferences and had a scope of improvement in the tested models’ accuracy. The comparison of the models applied in the system was limited to few evaluation metrics. Additional features such as customer satisfaction levels and external social media data can be added to the dataset. Also, real-time data can be used to predict the future scenario a company might face and take actions accordingly. A larger dataset can be used for obtaining higher accuracy on the deep learning models. Memory utilization parameters can be worked upon to make the model more efficient.
3 Proposed System 3.1 System Architecture The churn prediction proposed system can be visually represented using Fig. 3. It is a flowchart-based visualization that describes the methodology used by the system. Both the banking and telecom sector datasets are subjected to preprocessing that consists of data cleaning, data transformation, and feature selection. In the model development phase, various machine learning classification models and feedforward neural networks model ANN are trained on the datasets and the customer churn value is predicted using the respective highest accuracy models. Further, the exploratory data analysis on both the dataset are used to analyze the customer churning and suggest customer retention techniques. This model was set up on Intel Core i5 8th Gen 64 × Processor, Windows 10 operating system with 8 GB RAM.
23 Machine Learning and Neural Network Models for Customer …
245
3.2 Methodology The proposed churn prediction system for banking and telecom sector can be designed in the following step-by-step process: • Input datasets: Here, the proposed system uses two datasets as input: bank and telecom sectors dataset. These datasets are then subjected to preprocessing. • Data cleaning: Missing values of the dataset are handled by imputing them with mean or mode and by dropping rows that have less percent of missing values present. Duplicate values are handled by dropping the respective rows. • Feature extraction and transformation: Feature extraction is the selection of relevant features for training the model. Encoding categorical data is a preprocessing step in which categorical data are converted to numerical representation before fitting and evaluating the model. Feature scaling is used to standardize or normalize the highly varying scales of the features. • Splitting dataset: Once the preprocessing phase is over, the data are split as 80% training and 20% testing values. • Training classifiers and neural networks: The 80% of dataset which is split as training dataset is used to train the classifiers and neural networks. These trained models are also tuned with hyperparameters to achieve better results. • Model comparison using evaluation metrics: The trained classifiers models and neural networks are then tested using the testing dataset. The predicted churn values by these models are compared with the actual churn values, and evaluation metric values are calculated. • Churn prediction and analysis: The tested models are compared using accuracy, ROC, AUC, precision, recall, and F1 score, and a reliable model is chosen for future predictions. Exploratory data analysis is also performed on the datasets to gain insights into customer retention strategies.
4 Dataset 4.1 Bank Dataset This dataset consists of all the customers in a bank. Here, the dataset describes how frequently customers of the bank exit depending on the given parameters. The dataset has columns like age, gender, country, and other important features. It describes 10,000 different observations and ranges over 12 different features.
246
K. Patil et al.
4.2 Telecom Dataset The telecom sector faces a higher percentage of customers churning. Therefore, a telecom sector dataset is considered. It describes the services subscribed and customers’ personal information. The dataset includes 7043 telecom company customers and 21 features. The features can be categorized as follows: customers who exit the service, services subscribed by the customer, and customers account information and personal information which also include partners and dependents.
5 Algorithms 5.1 Model Building for Churn Prediction Prediction of customer churning helps companies launch new schemes and improve the existing services. Machine learning and deep learning algorithms are used for predicting whether a customer would churn or continue using the company’s services. Since one of the two numbers 0 or 1 has to be predicted whether a customer stays or not, respectively, binary class classification models are used on both datasets. Seven different classification algorithms and one neural networks model are used in this proposed module. The algorithms used are: support vector machine (SVM), logistic regression, random forest classifier (RF), naive Bayes (NB), decision tree (DT), k-nearest neighbor (KNN), XGBoost (XGB), and artificial neural networks (ANN) [9–11].
5.2 Logistic Regression The proposed system uses one of the linear classification algorithms, i.e., logistic regression for predicting the churn value of customers. Logistic regression is usually used for predicting binary class values: categorical or discrete. The algorithm uses a sigmoid function as the cost function to fit the predicted values onto a ‘S-shaped’ curve as the likelihood of the dependent variable [1].
5.3 Naïve Bayes The naïve Bayes algorithm is a supervised learning algorithm based on the Bayes theorem. It is a probabilistic classifier, which helps in creating rapid machine learning models that can make quick predictions. There are three types of naive Bayes model, namely Gaussian, multinomial, and Bernoulli. The proposed system utilizes Gaussian
23 Machine Learning and Neural Network Models for Customer …
247
NB classifier. It assumes the input data to be continuous and to follow a normal distribution. Z-score values are used to determine the probabilities of the dependent variable [12].
5.4 KNN K-nearest neighbor algorithm finds the similarity between the new data point and the available data point and classifies the new data point into the likely category. This similarity is determined on the basis of closeness of the points. The default Minkowski distance function is used in this system. Hyperparameter tuning is performed to get the optimal value of K for both the datasets [3].
5.5 Decision Tree Classifier Decision tree classifier is generally more suitable for solving classification problems rather than regression problems. Since the system performs classification, the split criterion used is entropy. The root node of the tree is identified as the feature with highest information gain. The algorithm begins from the root node of the tree and performs binary splitting of the nodes in an iterative manner until the leaf node, i.e., the target class, is reached [13, 14].
5.6 Random Forest Random forest is built on the concept of ensemble learning, which is a process of integrating numerous classifiers to solve an intricate problem and to upgrade the performance of the model. As the name suggests, random forest consists of several decision trees on multiple randomly formed subsets on the given dataset. In this system, hyperparameters tuning is performed while training the model to identify the tree’s split criterion, estimators, maximum features to be considered, and maximum depth for pre-pruning the tree [8].
5.7 XGBoost It is based on gradient boosting decision trees. In this method, decision trees are created in sequential form. The weights are assigned to all the independent variables which are fed to the tree, and the results are predicted. The variables that are predicted
248
K. Patil et al.
wrong by the tree are fed to the second decision tree. In the end, these classifiers give us a strong and precise model [4].
5.8 SVM This algorithm finds a hyperplane that classifies the data points into separate classes. In order to separate the data points, the main aim is to find a hyperplane that has a maximum margin with respect to its support vectors. Kernel functions are used for transforming the low-dimensional data to high. This system uses the default radial basis function which works well when data are not priorly known [1].
5.9 ANN An artificial neural network is a feedforward network that makes use of sigmoidal neurons. It has three layers: the first layer called input signals, the second layer consisting of hidden layers of sigmoidal neurons followed by the third layer representing the classes of output. This technique makes use of gradient optimization, which reduces the error function through the adaption of weights [2, 7].
6 Exploratory Data Analysis 6.1 Bank EDA For the bank customers dataset, it is observed that out of 10,000 different customers 7963 customers continued using their service and 2037 exited their service. Observing Fig. 4, it can be inferred that the citizens of Germany were most likely to switch their banks. Also, the female percentage of churning is higher than that of male. People in their mid-ages of 40–60 years were most likely to churn, and it can also be concluded that the inactive members’ churning rate was higher than that of active members.
6.2 Telecom EDA For the telecom customers dataset, out of 7043 different customers, 5163 did not churn, whereas 1669 customers considered churning. Observing the count plot of features according to the churning in Fig. 5, it can be inferred that the senior citizens
23 Machine Learning and Neural Network Models for Customer …
249
Fig. 3 Proposed system architecture
were most likely to switch their services. Also, the month-to-month contract basis customers churned the highest. People who did not consider having partners and dependents also showed a higher amount of churning. People using the electronic check payment method and fiber-optic Internet service were also observed to have churned. It can also be observed from Fig. 6 that people opting for additional services like online security and tech support churned the maximum followed by those who used online backup and device protection services.
Fig. 4 Exploratory data analysis for bank dataset
250
K. Patil et al.
Fig. 5 Exploratory data analysis for telecom dataset
Fig. 6 Number of churns in additional services
7 Result Analysis 7.1 Model Performance Various classification models were applied on the bank and telecommunication company customers datasets. The comparison of accuracies for each model is shown in Table 1. It was found that the random forest algorithm obtained the best results for the bank customers’ dataset, whereas the neural networks model gave the highest
23 Machine Learning and Neural Network Models for Customer … Table 1 Accuracy (%) obtained on various algorithms for bank and telecom datasets
251
Algorithms
Bank dataset (%)
Telecom dataset (%)
Logistic regression
81.10
80.03
Naïve Bayes
80.35
69.23
KNN
83.35
76.97
Decision tree
80.35
74.63
Random forest
87.05
79.32
SVM
82.60
73.77
ANN
84.00
81.93
XGBoost
80.35
80.24
accuracy for the telecom company dataset. It can be observed that the ensemble models and neural networks resulted in better accuracy than other classifier models. Evaluation metrics like ROC curve, area under curve (AUC), precision, recall, and F1 score were also considered for the performance comparison of the models. For the bank dataset, the possibility of discontinuation by bank customers was predicted based on features such as age, gender, geographical location, and many others. It was found that the random forest classifier based on ensemble technique gave the highest accuracy of 87.05%. For the telecommunication company dataset, the churn possibility for customers was predicted based on features such as age, gender, tenure, payment method, and many others. It was found that the ANN model gave the highest accuracy of 81.93%.
7.2 Strategy to Retain the Customers Bank: After analyzing the factors that caused a higher churn rate, efforts can be made toward retaining customers depending on those factors. The bank can organize special schemes on loans and accounts for women, since women were found to be more likely to churn. The bank could provide offers to people in the age group of 40–60 to retain them since maximum customers are in this age group. Telecom: The telecom industry has very heavy competition. Customers have many options to choose from, so the churn rate also tends to be high. The companies can provide special prices for senior citizen customers since a high churn rate is observed there. Simlarly, companies can come up with better business plans focusing on the factors that were found responsible for churning through exploratory data analysis.
252
K. Patil et al.
8 Conclusion and Future Scope In recent years, many service providing companies have tried to give highest priority to customer satisfaction. Thus, there has been an increasing demand for software programming models that analyze customer activity. Churn prediction is one such important asset which can identify tendencies of the company’s customers to discontinue their service. In this paper, current technologies for churn prediction are studied, and based on the understanding of the research gap, prediction results are improved. Various machine learning and deep learning algorithms, namely logistic regression, naïve Bayes, KNN, decision tree, random forest, SVM, and ANN, have been implemented on two datasets: bank and telecommunication company. All the models have predicted results with satisfactory accuracies on both the datasets. However, it was observed that the random forest model using the ensemble technique and the artificial neural networks model were the most successful classifiers for both the bank and telecommunication companies’ datasets. This system also analyzed both datasets and identified the factors responsible for churning. Considering these factors, customer retaining strategies were proposed which will help the companies make intelligent and well-informed business decisions for achieving higher profits. This study can be further expanded to perform real-time churn analysis. Large amount of data can be utilized for achieving better results using the deep learning models. The customer data for companies can be collected on a big data system and the predictive analysis process can be scaled up. This will ensure automated personalization in its true sense.
References 1. H. Jain, G. Yadav, R. Manoov, Churn prediction and retention in banking, telecom and IT sectors using machine learning techniques, in ed. by S. Patnaik, X.S. Yang, I. Sethi. Advances in Machine Learning and Computational Intelligence. Algorithms for Intelligent Systems (Springer, Singapore, 2021). https://doi.org/10.1007/978-981-15-5243-4_12 2. S. Osowski, L. Sierenski, Prediction of customer status in corporate banking using neural networks, in 2020 International Joint Conference on Neural Networks (IJCNN) (2020). https:// doi.org/10.1109/ijcnn48605.2020.9206693 3. K. Alkhatib, S. Abualigah, Predictive model for cutting customers migration from banks: based on machine learning classification algorithms, in 2020 11th International Conference on Information and Communication Systems (ICICS) (2020). https://doi.org/10.1109/icics49469. 2020.239544 4. A.K. Ahmad, A. Jafar, K. Aljoumaa, Customer churn prediction in telecom using machine learning in big data platform. J. Big Data 6, 28 (2019). https://doi.org/10.1186/s40537-0190191-6 5. K.G.M. Karvana, S. Yazid, A. Syalim, P. Mursanto, Customer churn analysis and prediction using data mining models in banking industry, in 2019 International Workshop on Big Data and Information Security (IWBIS) (2019). https://doi.org/10.1109/iwbis.2019.8935884 6. P. Hemalatha, G.M. Amalanathan,A hybrid classification approach for customer churn prediction using supervised learning methods: banking sector, in 2019 International Conference
23 Machine Learning and Neural Network Models for Customer …
7. 8.
9.
10.
11.
12.
13. 14.
253
on Vision Towards Emerging Trends in Communication and Networking (ViTECoN) (2019), pp. 1–6. https://doi.org/10.1109/ViTECoN.2019.8899692 S. Kumar, M. Kumar, Predicting Customer Churn Using Artificial Neural Network. Springer Reference Medizin (2019), pp. 299–306. https://doi.org/10.1007/978-3-030-20257-6_25 I. Ullah, B. Raza, A.K. Malik, M. Imran, S. Islam, S.W. Kim, A Churn Prediction Model using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector. IEEE Access (2019), pp. 1–1. https://doi.org/10.1109/access. 2019.2914999 F. Altinisik, H.H. Yilmaz, Predicting customers intending to cancel credit card subscriptions using machine learning algorithms: a case study, in 2019 11th International Conference on Electrical and Electronics Engineering (ELECO) (2019). https://doi.org/10.23919/eleco47770. 2019.8990563 I. Ullah, H. Hussain, I. Ali, A. Liaquat, Churn prediction in banking system using K-means, LOF, and CBLOF, in 2019 International Conference on Electrical, Communication, and Computer Engineering (ICECCE) (2019), pp. 1–6. https://doi.org/10.1109/ICECCE47252. 2019.8940667 J. Vijaya, E. Sivasankar, S. Gayathri, Fuzzy clustering with ensemble classification techniques to improve the customer churn prediction in telecommunication sector, in Recent Developments in Machine Learning and Data Analytics (2018), pp. 261–274. https://doi.org/10.1007/978981-13-1280-9_25 A. Alamsyah, N. Salma, A comparative study of employee churn prediction model, in 2018 4th International Conference on Science and Technology (ICST) (2018). https://doi.org/10.1109/ icstc.2018.8528586 S. Desai, S.T. Patil, Boosting decision trees for prediction of market trends. J. Eng. Appl. Sci. 13, 552–556 (2018) S. Desai, S.T. Patil, Efficient Regression Algorithms for Classification of Social Media Data
Chapter 24
Evaluation of Social Human Sentiment Analysis Using Machine Learning Algorithms Anjali Agarwal, Ajanta Das, and Roshni Rupali Das
1 Introduction Sentiment analysis is a crucial element of acknowledging a person’s personality. Interactions between the persons or reviews or comments needed to analyze sentiment of any human being. It is highly helpful in decision making. It also helps to explore new venture. This paper presents a sentiment analysis based on sample restaurant review dataset available in various Web sites or in social media. Restaurant review is essential before exploring the one. The customer’s satisfactory comments and reviews reflect the quality of the restaurants, the information related to reach the restaurant, price of the food including the reputation of the restaurant’s foods, services, and ambiances. We can conveniently gather feedback on restaurant foods or services using the Internet. Thus, the advantage of the accessibility of review comments from social media is enormous. In this light, reviews or comments can be positive or negative. Customers will be benefitted with both the comments; it is needless to mention that restaurant cited with positive comments will get higher priority for exploration. In contrast, people will mostly avoid exploring the restaurant cited with more negative comments. However, negative comments are important for the owner, as they may improve their services based on these comments. Feedback analysis is always vital in business for long lasting profit. The scope of this paper is to analyze sentiment of restaurant review comments. A comparative evaluation of the restaurant review dataset using six different machine learning algorithms, such as Gaussian naïve Bayes (GNB), logistic regression (LR), random forest (RF), decision tree (DT), support vector machine (SVM) and multinomial naïve Bayes (MNB), is presented in this paper.
A. Agarwal · A. Das (B) · R. R. Das Amity Institute of Information Technology, Amity University Kolkata, Kolkata, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_24
255
256
A. Agarwal et al.
The primary goal of this paper is to utilize restaurant evaluations to forecast social human sentiment for experimentation objectives. To study the research domain, this paper presents a literature review in Sect. 2. Section 3 proposes the detailed methodology of the sentiment analysis model. Section 4 presents the evaluation results with testing accuracy, training accuracy, and accuracy gap percentage to select the best algorithm for the training dataset of the proposed framework. Section 5 draws the conclusion of the paper.
2 Related Work A version of online and incremental random forests to perform sentiment analysis on customer reviews is suggested in paper [1]. It compares and analyzes the efficiency of factorization machine, Hoeffding tree, online random forest, incremental naive Bayes, and IRF. The performance of this incremental learning algorithm is demonstrated with 94% accuracy. A modern hybrid methodology is used for restaurant reviews in this study [2], and its output is assessed based on the restaurant review data and then graded by naïve Bayes, SVM, and genetic algorithm for the reduced data. The hybrid model reveals 92.44% of the precision of the classification. In [3], using naïve Bayes, the author tried to characterize Surabaya Restaurant customer satisfaction. Data sampling is performed using WebHarvy Software. The accuracy of naive Bayes and TextBlob is 72.06% and 69.12%, respectively. In [4], the reviews provided by the restaurant’s customers are evaluated using classification algorithms for machine learning. Comparison of classification algorithms, such as decision tree, k-nearest neighbor, naïve Bayes, random forest, supervised learning, and support vector machine, is presented in this paper. The simulation result proved that the SVM classifier for the specified dataset resulted in the highest accuracy of 94.56%. The novelty of the proposed research is that it begins the analysis of human sentiment based on a restaurant review dataset or review comments available on social networking sites. This paper presents the comparative evaluation of all the six machine learning algorithms, GNB, LR, RF, DT, SVM, and MNB, with its classification reports in detail. The classification report is generated using Python and used to select the best algorithm to train our proposed model.
3 Proposed Sentiment Analysis Tool This section discusses various components of the proposed sentiment analysis tool. The architecture of this tool is presented in [5]. In this architecture, 75% of training data and 25% of testing data are used to train the proposed model. The various phases are data collection, data processing, splitting dataset, classification followed
24 Evaluation of Social Human Sentiment Analysis …
257
by classification report. Preprocessing of the dataset is carried out with help of Python programming language using the Natural Language Toolkit (NLTK) library [6].
3.1 Data Collection We have collected restaurant feedback given by the customers as our reference input dataset of two columns to train the model. The first column contains the reviews of customers, and the second column represents the corresponding binary values. Data source: We have collected 1000 restaurant feedback given by the customers as our reference input dataset from [7]. Labeling of data: The dataset has two sections: The first column contains the reviews of customers, and the second column represents the corresponding binary values. For example, if the feedback is in the favor of the restaurant, then it is labeled as “1” (positive review) else “0” which can be understood as negative review.
3.2 Data Preprocessing Data need to be filtered, cleaned, and preprocessed before feeding into the model. The preprocessing phase comprises of removal of symbol and stop words, word tokenization, lemmatization and stemming stage and then finally joining all the words into a corpus. Each sub-phase is briefly described in the following: Case folding: Case folding means converting all the words containing upper-case into lower-case letters. Removal of symbols and stop words: The second stage is to remove the numbers (0–9), punctuation marks such as comma (,), apostrophe (’), question mark (?), colon (:), exclamation point (!), etc., and special characters such as $, %, #, and &. Word tokenization is the process where every paragraph, the sentence is broken down into smaller units called tokens such as words. Lemmatization: Stop word removal is a technique where the words which do not add any meaning to the sentences and are very common words are removed from the data such as “for”, “the”, “above”, and “has”. Stemming is a technique where all the words are reduced to their base word primarily known as word stem. Thus, “gone”, “going”, “goes” will be converted to “go”. Preparation of corpus: A corpus (i.e., collection of texts) is created where all the words left after the preprocessing will be collective. We have generated a word cloud representation using Python programming language where all the frequent words in
258
A. Agarwal et al.
the corpus are displayed to know the most common words provided by the customers in their comments. Next, a bag-of-words model (BOW) is implemented to extract meaningful attributes from the textual data. The bag-of-words model determines the probability of occurrence of each word, thus preventing data redundancy. It is known as the “bag” of words because any data about the order of words in the textual data are discarded. This model takes into consideration the occurrence of known words rather than the location of the word in the data.
3.3 Splitting Datasets In machine learning, a splitting dataset is used to evaluate the performance of the proposed model. In this technique, the dataset is divided into two parts commonly known as the training dataset and testing dataset. We have used the training dataset to test the prediction capability of the proposed model. The dataset is divided randomly using the train–test split function which is available in [8].
3.4 Classification In machine learning terminology, classification is a subset of supervised machine learning technique where an outcome is classified into two or more categories. In this paper, we have recognized two broad categories, positive or negative. The classifier can also be represented as a mathematical computational function that will map the input data received into the respective category. To find the best-fit algorithm for a particular dataset, we need to evaluate our proposed model in the six different algorithms: Gaussian naïve Bayes, logistic regression, random forest, decision tree, support vector machine, and multinomial naïve Bayes.
3.5 Classification Report A classification report is used to measure and analyze the performance of the model built based on its prediction. The model classifies the problem statement into two or more categories which help us to understand whether the model is correctly classifying or not from the test and train data, namely positive class and negative class. A classification report displays precision, recall, F1-score, and support values for the built model. It represents the most important key metrics for the classification problem on which the model is judged. In this paper, we have represented the classification report for each algorithm using the equation given below from (1) to (6).
24 Evaluation of Social Human Sentiment Analysis …
259
Precision: It represents how much amount of data is positive out of all the positive classes the model has predicted correctly. Recall: It represents how much amount of data is predicted correctly out of all the positive classes. The high the number, the better the model is. F1-score: It represents the model’s accuracy. It is represented as the harmonic mean of both recall and precision. It is a much better measure to use when we seek a balance between precision and recall, and there is an unbalanced distribution of classes. Support: It represents the fraction or ratio of transactions that includes items in the{X} and {Y } parts of the rule divided by the total number of transactions. It defines how frequently a collection of items occur together as a percentage of all the transactions. Accuracy: Out of the total classes, how much the model have predicted correctly is called accuracy. The high the number, the better the model is. Training accuracy: It represents the accuracy the model achieves when it is applied to the training data. Testing accuracy: It indicates the model’s accuracy when applied to testing data that is not visible to the model. Accuracy gap percentage: It represents the difference percentage between training accuracy and testing accuracy. Precision = TP/(TP + FP)
(1)
Recall = TP/((TP + FN))
(2)
F1 - score = (2 ∗ Precision ∗ Recall)/((Precision + Recall))
(3)
Support = (σ ( X + Y ))/Total
(4)
Accuracy = ((TP + TN))/((TP + TN + FP + FN))
(5)
Accuracy gap % = (Training Accuracy−Testing Accuracy) ∗ 100
(6)
where TP = is known as true positive, the number of truly identified positive cases. TN = is known as true negative, the number of truly identified negative cases. FP = is known as false positive, and the number of false identified negative cases is also known as type I error. FN = is known as false negative, and the number of false identified positive cases is also known as type II error.
260
A. Agarwal et al.
Fig. 1 Classification report for Gaussian Naïve Bayes classifier
4 Experimental Results This section presents the evaluation of the proposed model using six different machine learning algorithms with a classification report in detail. We have generated the classification report of all algorithms with precision, recall, F1-score, and support values. However, only the classification report of GNB is presented due to page limit.
4.1 Gaussian Naïve Bayes (GNB) Gaussian naive Bayes is a special type of naive Bayes algorithm that always follows normal or Gaussian distribution and continuous data values. It is particularly utilized when the features of data have continuous values. A GNB model is created by assuming that the data are described by normal distribution without any covariance, i.e., independent distribution. The GNB model can be fitted by easily finding only the standard deviation and the mean of the data points to define a distribution. All the evaluation parameters are shown in Fig. 1. The accuracy of GNB for training and testing dataset is 88.9% and 71.2%, respectively, with a 17.7% accuracy gap.
4.2 Logistic Regression (LR) Logistic regression is a type of supervised machine learning algorithm that can perform both classification and regression of data. This algorithm creates a regression model to find out the probability of a particular action. The cost function of logistic regression ranges between 0 and 1. In the classification report of LR, the accuracy for training and testing dataset is 95% and 70%, respectively, with a 25% accuracy gap.
24 Evaluation of Social Human Sentiment Analysis …
261
4.3 Random Forest (RF) Random forest is also a type of supervised machine learning algorithm that can perform both classification and regression of data. In RF, it consists of a combination of various individual decision trees together which is directly proportional to the outcome. Every internal node within the tree refers to a particular feature or an attribute, and every leaf node denotes a class label. RF is the most effective model as it depends on the majority voting of every decision tree. The accuracy of RF for training and testing dataset is 99.3% and 68.4%, respectively, with a 30.9% accuracy gap.
4.4 Decision Tree (DT) A decision tree is a graphical representation or a chart diagram that we use to decide a flow of action or to show likelihood. It shapes the layout of a woody plant which is generally upstanding, however, occasionally lying on the sides. Every branch of the DT describes a possibility of a choice, result, response, or decision. Each node in the DT refers to a particular test case for some features, and descending through the edges, we reach the possible outcomes of the respective test cases. The DT is recursive by nature, and it is the same for every sub-tree as it is rooted at the new nodes. The accuracy of DT for training and testing dataset is 99.3% and 65.2%, respectively, with a 34.1% accuracy gap.
4.5 Support Vector Machine (SVM) Support vector machine (SVM) is a type of supervised machine learning algorithm that can perform both classification and regression of data. SVM is primarily used to classify the data into different groups. The algorithm builds a model which will create different hyperplanes to separate the data based on their cognition of patterns. The most effective hyperplane will be the hyperplane where the distance between the data points and the hyperplane itself is maximum. The accuracy of SVM for training and testing dataset is 95% and 69.6%, respectively, with a 25.4% accuracy gap.
4.6 Multinomial Naïve Bayes (MNB) A multinomial naive Bayes algorithm is a special type of naive Bayes algorithm where the word naïve depicts that all the features in the dataset are mutually independent of each other. The occurrence of one attribute of data does not depend on another
262
A. Agarwal et al.
Fig. 2 Comparative Evaluation of Machine Learning Algorithms
attribute. It performs best with textual data and is relatively easy to use, fast, robust, and accurate. The accuracy of MNB for training and testing dataset is 95.8% and 69.6%, respectively, with a 26.2% accuracy gap.
4.7 Inference Figure 2 presents the comparative evaluation of all these algorithms. It is observed that the training accuracy for both RF and DT is 99% which is very efficient, but the testing accuracy achieved is still low, thus increasing the chances of the model over-fitting. Relative to other algorithms, it is observed that using the GNB the best accuracy is achieved where training accuracy is 88.9% and the testing accuracy is 71.2% with an accuracy gap percentage of 17.7% which shows that this framework is very effective. Thus, Gaussian naïve Bayes is proved to be the best among all models for restaurant review sentiment analysis tool.
5 Conclusion In the domain of sentiment analysis, it is a big challenge to understand the general feedbacks or reviews using textual data and develop an automatic sentiment tool. With a significant increase of interactions in social media, analyses of the interactions are also important for decision making. Therefore, with the availability of restaurant review dataset of 1000 reviews, this paper presents a comparison of the various machine learning algorithms based on the performance of the algorithm and classification report with accuracy gap percentage. Evaluation of dataset is carried out with training accuracy and testing accuracy. Moreover, it is observed that the Gaussian naïve Bayes model is outperforming all other models to analyze social
24 Evaluation of Social Human Sentiment Analysis …
263
human sentiment. Therefore, the future directive of this research is to deploy the Gaussian naïve Bayes model using a Web application to analyze the sentiment of the restaurant reviews. In the future, we want to use this sentiment analysis tool for general-purpose sentiment analysis irrespective of any domain so that the end user can access the human sentiment analysis as a service in either the cloud platform or through the mobile app.
References 1. T. Doan, J. Kalita, Sentiment analysis of restaurant reviews on yelp with in-incremental learning, in 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) (IEEE, 2016), pp. 697–700 2. M. Govindarajan, Sentiment analysis of restaurant reviews using hybrid classification method. Int. J. Soft Comput. Artif. Intell. 2(1), 17–23 (2014) 3. R.A. Laksono, K.R. Sungkono, R. Sarno, C.S. Wahyuni, Sentiment analysis of restaurant customer reviews on trip advisor using Naïve Bayes, in 2019 12th International Conference on Information and Communication Technology and System (ICTS) (IEEE, 2019), pp. 49–54 4. A. Krishna, V. Akhilesh, A. Aich, C. Hegde, Sentiment analysis of restaurant reviews using machine learning techniques, in Emerging Research in Electronics, Computer Science and Technology (Springer, Singapore, 2019), pp. 687–696 5. A. Agarwal, R.R. Das, A. Das, Machine learning techniques for automated movie genre classification tool, International Conference on Recent Developments in Control, Automation and Power Engineering (RDCAPE) (2021) 6. Natural Language Toolkit: https://www.nltk.org/. Accessed on 24th Jan 2021 7. Super Data Science: www.superdatascience.com. Accessed on 24th Jan 2021 8. Machine Learning in Python: https://scikit-learn.org/stable/. Accessed on 24th Jan 2021
Chapter 25
A Background Study on Feature Extraction for 2D and 3D Object Models Xiaobu Yuan and Shivani Pachika
1 Introduction Features are distinguishing characteristics of input patterns that help in categorizing them and address the various computational tasks of a particular application. Edges, blobs and points are examples of features that represent the specific structure in an image. Its main purpose is to recognize, analyze, and track objects in a video stream so that the semantics of their actions and behavior may be described [1]. Keypoints, interest points, salient features, anchor points, and landmarks are other terms for feature points. Feature detector (also known as extractor) traditionally refers to an approach or technique for detecting feature points in an image. These features are then logically specified in unique ways based on various patterns shared with neighboring pixels. This process is referred to as feature description. According to Fraundorfer and Scaramuzza [2], point feature detectors can be classified into two classes: corner detectors and blob detectors. Blob detectors are more distinct and well-localized in scale, but corner detectors are quick to calculate and well-localized in image position. In pattern recognition and image processing, transforming the input data into a reduced representation set of features (also named features vector) is called FE. It is anticipated that if the extracted features are appropriately chosen, the features set will extract the required information from the input data to accomplish the intended task using this reduced representation rather than the full-size input. Recognition entails calculating what kind of objects is present and can be performed at both: a class level (all the mugs) and an instance level (a specific,
X. Yuan · S. Pachika (B) School of Computer Science, University of Windsor, Windsor, Canada X. Yuan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_25
265
266
X. Yuan and S. Pachika
single mug) [3]. Local feature description, feature matching, and multi-view geometry are critical building blocks in the development of instance-level object recognition systems. But, when it comes to category-level (class level) object recognition, the emphasis is on learning a robust higher-level representation of each object category, which necessitates the use of more machine learning techniques [4]. Object detection is the process of determining which objects are present in a given environment as observed by the vehicle’s sensors [3]. To put it another way, object detection can be divided into two simple tasks: object localization and object classification. The process of identifying an object in a given image or point cloud is known as object localization. The identified object class, such as a car, is provided by object classification [5].
2 Literature Survey The relevant techniques in 2D and 3D FE approaches are discussed in this chapter.
2.1 2D Handcrafted Techniques There are two types of image descriptor techniques: those that describe the entire scene without a selection phase and those that selectively extract the local region portions of the image that are interesting or distinctive in some way. Global or whole-image descriptors do not have a detection phase but process the whole image regardless of its content. However, global feature representation has its limitations. It cannot handle occlusions, changing viewpoints, or deformable objects [4]. Local feature descriptors begin with a detection phase that identifies the image’s most interesting regions as local features [6]. Examples of local feature descriptors are scale invariant feature transform (SIFT) [7] and speeded-up robust feature (SURF) [8]. The SIFT feature has been made resistant to scale and rotation variations [4]. Bay et al. suggested SURF [8] to increase the performance of SIFT by requiring less time for processing and so attaining improved timing efficiency. There is no superior technique for detecting keypoints in all implementations. Each detector looks for features that are invariant to occlusion, perspective, scale, rotation, and light to varying degrees. Corners in images contain important information for describing object features and are sometimes preferred above edges, ridges, and contours. In detection, the word “corner” does not refer to a real corner but rather to points in an image whose gradient direction rapidly changes. Corner detection is critical for many computer vision tasks including 3D reconstruction, image recognition, detection, stereo matching, object recognition, and object tracking. The Moravec corner detection algorithm [9] is an early approach that involves matching overlapping patches surrounding each neighboring pixel to test each pixel in the image. This method is computationally demanding. The Harris corner detector [10] outperforms
25 A Background Study on Feature Extraction …
267
the Moravec method. The Harris method uses a covariance matrix of local directional derivatives to determine the direction of the fastest and smallest change in feature orientation. Still, it is computationally intensive for real-time applications. The Shi, Tomasi, and Kanade corner detector [11] streamlines the calculation by optimizing the Harris technique by using only the minimal eigenvalues for discrimination. Features from accelerated segment test (FAST), introduced by Rosten and Drummond [12], employs a Bresenham circle with a radius of three pixels to determine if a candidate point is a keypoint or not. The FAST corner detector’s most intriguing feature is its computational efficiency and is ideal for real-time video processing applications [13]. Wang and Weichuan [13], Levin and Vidimlic [14] present a survey of the detectors and descriptors of interest points. The corner detection techniques proposed in the previous four decades were evaluated by Wang et al. [15].
2.2 2D Deep Learning Techniques Instead of employing handcrafted features, researchers are increasingly turning to neural networks like convolutional neural networks (CNNs) and residual neural networks (RNNs) to learn feature representations automatically [4]. DL allows for better accuracy in applications such as image classification, object detection, and semantic segmentation. DL approaches such as CNNs have pushed the boundaries of what is achievable by improving prediction performance using massive data and abundant computational power. The focus here is on the state-of-the-art methods that could be used in autonomous vehicles. While all cutting-edge methods rely on deep convolutional neural networks (DCNNs), which can be categorized as follows: (1) Single-stage architecture-based algorithms are those that do detection in a single step, such as Single Shot MultiBox Detector (SSD) and You Only Look Once (YOLO), and (2) two-stage architecture-based algorithms are those that do detection in two stages, such as faster-region-based convolutional neural networks (RCNN). The first stage of a two-stage design is the generation of region proposals, i.e., extracting features from the given input image, and the second layer works on the extracted features [5]. Neural network models, also known as detection engines, are used as the backbone for feature extraction in each of these method types. Different features are obtained depending on the backbone used. Dense convolutional network (DenseNet), residual neural network (ResNet), and visual geometry group (VGG) are examples of feature extraction models commonly used as backbone. In comparison with other detection technologies at the time, the YOLO detector proved to be very fast. To reduce computing time, bounding boxes and probabilities are predicted for each image portion at the same time. As compared to two-stage detectors, the YOLO detector has a significant disadvantage in terms of small object localization accuracy. SSD was offered as a solution to the problem of localization accuracy in one-stage detectors. Despite this, two-stage detectors outperformed single-stage detectors in terms of accuracy [16]. Directly combining CNN with a sliding window technique
268
X. Yuan and S. Pachika
makes it harder to precisely locate items. To solve these challenges, RCNN has been proposed as a way to increase object detection performance using a selective search strategy on an input image to extract regions and is fed into CNN which acts as a feature extractor [17]. RCNN has been shown to outperform other feature types and ensemble methods; however, the accuracy of classification in each candidate’s box is difficult to identify small things, including distant human faces and cars due to the lack of contexts and poor resolution [18]. In both time and space, RCNN training is costly. RCNN-based object detection was found to be slow during testing. The implementation of a CNN forward pass without shared computation slows down RCNN work for each object proposal [19]. Girshick et al. [20] developed the fast RCNN algorithm, an upgraded version of the RCNN method. Fast RCNN generates output in less time than simple RCNN, and feature map is generated from the convolution operation which is performed only once per image. Fast RCNN’s advantage over former leading techniques is multi-stage pipeline training. Ren et al. [21] refined fast RCNN and developed faster RCNN where the input image is provided to CNN to generate a convolutional feature map, which achieved state-of-the-art object detection accuracy with real-time detection speed [17]. It uses regional proposal network (RPN) instead of selective search. Two-stage detectors are reported to achieve higher accuracy than one-stage detectors, but one-stage detectors detect faster [16]. Moreover, deeper and more complicated models bring with them a slew of challenges and constraints. Over-fitting is a problem that can be solved by using regularization techniques, extending the training dataset, or using semi-supervised learning methods can be employed.
2.3 3D Feature Extraction Techniques The detection of 3D keypoints is an important stage in object recognition. 2D feature engineering has inspired several 3D keypoint detectors. The approach for extracting features should be independent of how the data are represented. The approach should also be invariant under 3D object transformations such as translation, rotation, and scaling. Examples of 3D feature extraction algorithms are HOG 3D, 3D SIFT, Harris 3D, etc. Extensive research on 3D descriptors has taken place with handcrafted and DL techniques [22]. Guo et al. [23] conducted a thorough analysis of the performance of traditional-handcrafted 3D feature descriptors. Wu et al. [4] classified the features into three categories based on the source of the sensor information from which they were extracted as 2D image features, 3D geometry local feature, and RGB-D feature. Huang et al. [24] grouped the 3D object detection methods of camera based into proposal- based, transform based, 3D shape based, depth map based, and 2D–3D geometry based. Each original 3D object, on the other hand, can be represented by a collection of several rendered images, and existing 2D feature extraction methods can be performed.
25 A Background Study on Feature Extraction …
269
3 Proposed Methodology The overall system depicted in Fig. 1 is the work of six students. This system is made up of six different modules of a self-driving car that is all interconnected and is represented by different colored boxes. The module highlighted in the green box in Fig. 1 is the contribution of this survey work. We propose incorporating a new source of a priori information, the virtual 3D cit model to extract features. The system begins by building a virtual 3D city environment out of OpenStreetMap (OSM) data and the Google Street View images’ façade textures as illustrated in the red-colored module. This virtual city contains static objects, for example, buildings. Variable objects like trees can be added to the virtual 3D city model for realism. Furthermore, a repository having 3D dynamic objects like cars is also presented. The real-time video (sequence of images) given as the system’s input is shown in Fig. 1 as the blue-colored box. The virtual scenes are rendered as if they were a real-time drive,
Fig. 1 Flowchart for overall system
270
X. Yuan and S. Pachika
Fig. 2 Hybrid technique to perform feature extraction
and also, extracted keypoint features are saved as annotations in a repository, and this task is done in the green-colored module. Figure 2 above explains the hybrid technique of combining deep learning and handcrafted technique to fetch keypoints. The pink-colored module verifies and eliminates variable and static objects by matching extracted virtual environment’s key points with its corresponding real-time image’s keypoints. The car’s location in real-time is confirmed by matching keypoint properties of the real-time and virtual environment, and this addresses the challenge of self-driving car’s geo-localization. The computation time for identifying and predicting dynamic objects such as animals or pedestrians on the road as well as those that affect the navigation of the self-driving system is lowered as variable and static objects are eliminated. Cyan-colored module deals with real-time dynamic object recognition, pose estimation, and speed calculation. To update dynamic objects into the virtual world, the detected object with pose information, speed, and location are required. The gray-colored module maintains the information related to dynamic objects in the virtual world.
4 Implementation and Results The proposed methodology was executed using 1 GPU, Blender 2.79, and Python 3.7.1. This approach uses the 3D virtual world constructed from OSM and the Google Street View façade textures. The selected area in this research work for virtual and real-time is: https://www.google.ca/maps/@43.4635959,80.521974,3a,75y,149. 48h,86.24t/data=!3m6!1e1!3m4!1sw2rHLgwcrsCoZyZq-kIqvA!2e0!7i16384! 8i8192 and sensor used is camera as its inexpensive and have a higher resolution (Figs. 3, 4, 5, 6 and 7).
25 A Background Study on Feature Extraction …
Fig. 3 Creation of 3D virtual city
Fig. 4 Hybrid feature extraction applied on virtual building rendered image
Fig. 5 Static object elimination and retained dynamic objects after elimination
Fig. 6 Dynamic object recognition and identification
271
272
X. Yuan and S. Pachika
Fig. 7 Virtual city update
5 Conclusion and Future Works This paper studies the various feature extraction and object detection techniques and their benefits and drawbacks. Here, we are using handcrafted approaches in conjunction with DL to improve the performance and benefits of both because handcrafted algorithms are well-established, transparent, and optimized for performance and power-efficiency, whereas DL provides greater accuracy and versatility at the expense of significant computational resources. The one-stage detectors have been effectively used in real-time applications due to their high processing speed. Despite being fast, the lower accuracy remains a bottleneck for high precision requirements. Combining the benefits of one-stage and two-stage detectors remains a major challenge.
References 1. E. Salahat, M. Qasaimeh, Recent advances in features extraction and description algorithms: a comprehensive survey, in 2017 IEEE International Conference on Industrial Technology (ICIT) (IEEE, 2017), pp. 1059–1063 2. F. Fraundorfer, D. Scaramuzza, Visual odometry: part ii: matching, robustness, optimization, and applications. IEEE Robot. Autom. Mag. 19(2), 78–90 (2012) 3. S. Garg, N. Sünderhauf, F. Dayoub, D. Morrison, A. Cosgun, G. Carneiro et al., Semantics for Robotic Mapping, Perception and Interaction: A survey (2021). arXiv preprint arXiv:2101. 00443 4. K. Wu, Action for Perception: Active Object Recognition and Pose Estimation in Cluttered Environments (Doctoral dissertation) (2017) 5. D. Sharma, Evaluation and Analysis of Perception Systems for Autonomous Driving (2020), pp. 8 6. A. Abdelbaki, ConvNet Features for Lifelong Place Recognition and Pose Estimation in Visual SLAM (2018) 7. D.G. Lowe, Object recognition from local scale-invariant features, in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, (IEEE, 1999), pp. 1150–1157
25 A Background Study on Feature Extraction …
273
8. H. Bay, T. Tuytelaars, L. Van Gool, Surf: speeded up robust features, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 404–417 9. H.P. Moravec, Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover. Doctoral dissertation, Stanford University (1980) 10. C. Harris, M. Stephens, A combined corner and edge detector, in Alvey Vision Conference, vol. 15, no. 50 (1988), pp. 10–5244 11. J. Shi, Good features to track, in 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 1994), pp. 593–600 12. E. Rosten, T. Drummond, Machine learning for high-speed corner detection, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 430–443 13. S. Krig, Interest point detector and feature descriptor survey, in Computer Vision Metrics (Springer, Cham, 2016), pp. 187–246 14. A. Distante, C. Distante, Handbook of Image Processing and Computer Vision: Volume 2: From Image to Pattern (2020) 15. J. Wang, Z. Weichuan, A survey of corner detection methods, in Advanced Science and Industry Research Center. Proceedings of 2018 2nd International Conference on Electrical Engineering and Automation (ICEEA2018). Advanced Science and Industry Research Center (2018), p. 6 16. A. Levin, N. Vidimlic, Improving Situational Awareness in Aviation: Robust Vision-Based Detection of Hazardous Objects (2020) 17. Y. Xu, G. Yu, Y. Wang, X. Wu, Y. Ma, Car detection from low-altitude UAV imagery with the faster R-CNN. J. Adv. Transp. (2017) 18. L. Huang, Y. Yang, Y. Deng, Y. Yu, Densebox: Unifying Landmark Localization with End to End Object Detection (2015). arXiv preprint arXiv:1509.04874 19. P. Joshi, C. Sehra, Object Detection Algorithms: A Brief Overview (2018) 20. R. Girshick, Fast r-cnn, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1440–1448 21. S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems (2015), pp. 91–99 22. J. Ma, X. Jiang, A. Fan, J. Jiang, J. Yan, Image matching from handcrafted to deep features: a survey. Int. J. Comput. Vis. 129(1), 23–79 (2021) 23. Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, N.M. Kwok, A comprehensive performance evaluation of 3D local feature descriptors. Int. J. Comput. Vis. 116(1), 66–89 (2016) 24. Y. Huang, Y. Chen, Autonomous Driving with Deep Learning: A Survey of State of-Art Technologies (2020). arXiv preprint arXiv:2006.06091
Chapter 26
A Feature Extraction and Heatmap Generation Approach Based on 3D Object Models and CNNS Shivani Pachika and Xiaobu Yuan
1 Introduction The latest development of AV has been enabled by rapid advancements in artificial intelligence (AI). Robotic cars, self-driving, and driverless as well as other platforms capable of sensing, interacting, and navigating without human assistance are examples of AV. Visual sensors are typically employed to gather images, while computer vision (CV), signal processing, ML, and other techniques are utilized to acquire, process, and extract information [1]. In the real world, obtaining training set for AI algorithms associated with self-driving comes with a heap of risks and costs. The reliance on 3D object models and their corresponding annotations which are gathered and maintained by individual companies hinder the progress of new algorithms. Massive volumes of human-annotated training data and other time-consuming procedures stymie the advancement of these DL efforts. Nowadays, GPS is commonly used in outdoor positioning systems because of its affordability and ease. However, satellite masks can cause GPS to lose signal. The goal of ongoing research is to integrate a virtual 3D city model and an onboard camera in the localization process (1) to provide absolute positioning information and pose estimation to correct the prediction in absence of GPS measurement availability and (2) to train the system not only to detect objects but also to extract 3D object model features for matching, verification, elimination, and identification.
S. Pachika (B) · X. Yuan School of Computer Science, University of Windsor, Windsor, Canada X. Yuan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_26
275
276
S. Pachika and X. Yuan
2 Literature Survey Various implementations including pattern recognition applications that involve classification, detection, matching, recognition, registration, and reconstruction rely heavily on FE techniques. FE involves extracting distinctive information from an image and representing it in a low-dimensional space. The next step is feature selection, which consists of only those features from all extracted features that have the robust or most relevant information to reduce calculation complexity and boost detection speed [2]. Handcrafted and DL-based techniques will be thoroughly discussed. Handcrafted feature descriptors frequently rely on prior expert knowledge and are still commonly implemented in many visual applications. These detected features represent specific semantic structures in an image or real world and are grouped as corner feature, line/edge feature, blob feature, and morphological region feature. However, the most commonly used are corners features. Traditional corner detection can be performed using intensity-, contour curvature-, gradient-based approaches. The gradient-based approaches are the most accurate among these three types, while the intensity-based methods are the most efficient. The contour curvaturebased methods are the preferred option for processing binary or texture-less images, and the point-based descriptors are frequently coupled with the matching task [3]. Moravec, Forstner, Harris, Shi-Tomasi, and FAST are examples of corner detectors. Kaneva et al. [4] compared the performance of five feature descriptors on datasets based on rendered real and synthetic images of the Statue of Liberty. Cappelle et al. [5] estimated the vehicle’s pose from matching Harris points between the virtual 3D city model and 2D acquired images from Stanislas square at Nancy, in France. Using the Harris corner detection algorithm, Feng et al. [6] devised an application for feature detection and matching, but there are some problems such as low positioning accuracy and the huge amount of computation. The FAST corner detector is suitable for video processing applications in real time because of its high-speed computational efficiency [7]. As DL has high skill in information extraction and representation, it has reached a satisfactory performance in feature description. Endto-end learning is a concept developed by DL, in which neural networks (NN) find the underlying patterns in classes of images and automatically work out the most descriptive and salient features for each specific class of object for each object after being trained on the given data. Especially, CNN can minimize the work for designing features, model objects, and the need to rely on additional sensors. There are two main DNN architectural types for object detection: region-based networks (two stage) and regression-based networks (one stage). One stage directly predicts class probabilities and bounding box offsets from full images with a single feedforward CNN. Examples of one stage are You Only Look Once (YOLO) and Single Shot MultiBox Detector (SSD). In two-stage frameworks, the first step consists of category-independent region proposals, followed by CNN feature extraction from these regions. In the second step, category-specific classifiers are used to determine the category labels of the proposals. Most two-stage networks produce thousands of region proposals
26 A Feature Extraction and Heatmap Generation Approach …
277
at test time, which comes with a high computational cost [8]. Girshick et al. [9] presented a basic RCNN architecture to detect object region proposals based on the selective search method. With time, continuous improvements have been done to RCNN architecture, i.e., RCNN, Fast RCNN, and Faster RCNN. Faster RCNN [10] employs regional proposal network instead of selective search and achieved state-of-the-art accuracy with detection speed in real time. Suwajanakorn et al. [11] demonstrated that the network can predict a 3D keypoints set that is consistent across viewing angles of the same object and object instances for a given single 2D image of a known class. Without keypoint location supervision, these keypoints and their detectors are identified and trained automatically. The drawback is that orientation and spatial transformations of keypoints are not calculated. Khan [12] estimated the orientation of pedestrians on a pedestrian crossing by making use of optical flow vectors of the valid points to avoid pedestrian–vehicle collisions. The same concept is used here to calculate orientation for vehicles. Prabhakar et al. [13] trained RCNN with the PASCAL VOC 2012 dataset to detect and classify 20 different objects like pedestrians and vehicles. Considerable research on 3D feature descriptors has been carried out using handcrafted and DL techniques [3]. Guo et al. [14] illustrated a performance evaluation of handcrafted 3D feature descriptors. Wu [15] classified the features based on the sensor information source from which they were extracted as 2D image features, 3D geometry local feature, and RGB-D feature. A collection of rendered images can, however, be used to denote a 3D object, and existing 2D FE algorithms can be applied. [16–18] developed a DL-based object detection algorithm to train 3D virtual models. 2D images automatically generated from these 3D virtual models are used to train the CNN model.
3 Proposed Methodology The objective of the new method outlined here is to integrate a virtual 3D city model constructed from OpenStreetMap (OSM) and the Google Street View façade textures as a new source of a priori knowledge for the detection of objects, FE, and generation of heatmaps for self-driving cars. The sensor used is a camera as it is cost-effective (Table 1).
3.1 Feature Extraction: Cars This module begins by selecting 185 3D car models as part of the experimentation from the ShapeNet repository. Next, normalize the object for each model. After that, each model is rendered to create 240 images. Then, for each model, create .tf records. Now split .tf records into train set, the test set, the validation set. To dilated CNN, the defined global camera projection matrix, the created .tfRecords,
278
S. Pachika and X. Yuan
Table 1 Different types of 3D objects and CNNs used in this project
and tuned parameters are fed. The number of keypoints to be detected has been set to 10 in this algorithm. The orientation network now receives the convolutional dilated base network’s output. An object’s orientation is inferred by a system, built by an orientation network. To predict 3D keypoints, the keypoint network is constructed with the help of an orientation network, and rendered images are now directed to KeypointNet as test images. The projected 3D keypoints are then normalized to 2D keypoints to calculate the orientation and spatial transformation of keypoints. Now outputs are saved for feature matching and object identification in the repository as annotation files (.txt) (Fig. 1).
3.2 Feature Extraction: Buildings This module’s input is the textured 3D virtual city video stream. Next, render the scene to get the virtual views and then calibrate them. Next, perform detection of buildings using faster RCNN. Proceed to the next frame if the required object is not present. Alternatively, once identified, i.e., the bounding box has been established, the coordinates for that bounding box are retrieved. The cropping operation is carried out based on those coordinates. After that, preprocess these cropped images. The FAST method with Non-maximal suppression is then used for extracting features. FAST employs a 16-pixel circle to determine whether or not point p is a corner. Now, with the Euclidean distance formula, select the 32 keypoints which are farthest from the center as a selective set of keypoints. Finally, for matching of features and verification of objects, these keypoints are stored in the repository as annotations (.txt files) (Fig. 2).
26 A Feature Extraction and Heatmap Generation Approach …
Fig. 1 Flowchart for features extraction for cars
279
280
S. Pachika and X. Yuan
Fig. 2 Flowchart for feature extraction for buildings
3.3 Heatmap Generation: Buildings The real-time input is used in the generation of the heatmap for buildings. Start by rendering the scene in real time. After that, calibrate these views. Then, to detect buildings, use Faster RCNN. Proceed to the next frame if the required object is not present. Its bounding box coordinates are taken, once it has been identified to perform cropping. Apply preprocessing to these cropped images. Omit background details to remove noise. Then, using virtual data, retrieve its associated structural information and scale it appropriately. This is followed by calling applyColorMap() on the grayscale image. Now create a heatmap by adjusting the parameters. After that, resize the heatmap as per the virtual building’s structural information. Heatmap matrix value should be either 0 or 1 based on heatmap pixel value. Because it only contains binary digits, the newly created heatmap is known as a binary heatmap. Finally, for object elimination, store these binary heatmaps in the repository (Fig. 3).
26 A Feature Extraction and Heatmap Generation Approach …
281
Fig. 3 Flowchart for heatmap generation on buildings
4 Implementation and Results The proposed methodology was executed using 1 GPU, Windows 10, Blender 2.79, and Python 3.7.1. The selected area in this research work for virtual and real time is: https://www.google.ca/maps/@43.4635959,80.521974,3a,75y,149.48h,86.24t/ data=!3m6!1e1!3m4!1sw2rHLgwcrsCoZyZq-kIqvA!2e0!7i16384!8i8192 (Figs. 4, 5 and 6).
282
Fig. 4 Feature extraction for car
Fig. 5 Feature extraction for building
S. Pachika and X. Yuan
26 A Feature Extraction and Heatmap Generation Approach …
283
Fig. 6 Heatmap generation on building
4.1 Flowchart for Dependency Modules The following Fig. 7 illustrates the study of six students. The contributions of this research are shown in purple color.
Fig. 7 Flowchart of feature extraction module along with its connecting components
284
S. Pachika and X. Yuan
4.2 Advantages and Disadvantages of the Proposed Methodology Attained a high mean average precision (mAP) score of 0.765 while training trees and buildings using Faster RCNN and accuracy of 99.9% while training cars using dilated CNN. Matching features and verification of real-time objects are performed using the feature points obtained from virtual models. The building and trees will be removed after applying the masking through heatmap generation. The object elimination algorithm’s running time is 0.548 s, while the dynamic objects detecting algorithm took 17 s on the original image and 15.8 s on the masked image. This methodology improved the efficiency, cost, accuracy, and computation time of the dynamic object detection with no false object detections. Improved confidence of car object recognition with mAP = 91, whereas Tangruamsub et al. [19] reported mAP = 86 only. Improved the pose estimation as well. One of the drawbacks of OSM data is its lack of availability and quality. On computer-generated images, corners that are aligned to the x- and y-axes perfectly are not detected by the FAST algorithm [20]. Faster RCNN architecture usually fails to detect very small objects in images and may miss some positive samples when applied to videos. For cars with similar front and rear views, the orientation network fails to predict orientation correctly [11]. The ML approach is required additionally for training to detect real-time variable and static objects which are missing in the virtual environment.
5 Conclusion and Future Works The main goal of this proposed methodology is to develop novel CV tasks for selfdriving vehicles by directly utilizing open-source graphic model datasets (a new source of a priori information) as the virtual representations of real-world objects (the virtual 3D city). Our framework detects objects with high accuracy, extracts object features from the real time and virtual world and also constructs heatmaps on the real-time variable and static objects. The KeypointNet can handle an arbitrary number of keypoints, and tuning hyper-parameters on high-performance GPU systems can improve the inference time and mAP scores. Two-stage detectors are reported to achieve higher accuracy than one-stage detectors, but one-stage detects faster. Overall, we pursued the research path to train cars to see using the virtual world. Finally, these findings may serve as a springboard for future studies on unsupervised learning from both virtual and real data.
26 A Feature Extraction and Heatmap Generation Approach …
285
References 1. T.A.Q. Tawiah, A review of algorithms and techniques for image-based recognition and inference in mobile robotic systems. Int. J. Adv. Rob. Syst. 17(6), 1729881420972278 (2020) 2. R.C. Joshi, M. Joshi, A.G. Singh, S. Mathur, Object detection, classification and tracking methods for video surveillance: a review, in 2018 4th International Conference on Computing Communication and Automation (ICCCA) (IEEE, 2018), pp. 1–7 3. J. Ma, X. Jiang, A. Fan, J. Jiang, J. Yan, Image matching from handcrafted to deep features: a survey. Int. J. Comput. Vis. 129(1), 23–79 (2021) 4. B. Kaneva, A. Torralba, W.T. Freeman, Evaluation of image features using a photorealistic virtual world, in 2011 International Conference on Computer Vision (IEEE, 2011, November), pp. 2282–2289 5. C. Cappelle, M.E. El Najjar, F. Charpillet, D. Pomorski, Virtual 3D city model for navigation in urban areas. J. Intell. Rob. Syst. 66(3), 377–399 (2012) 6. J. Feng, C. Ai, Z. An, Z. Zhou, Y. Shi, A feature detection and matching algorithm based on Harris algorithm, in 2019 International Conference on Communications, Information System and Computer Engineering (CISCE) (IEEE, 2019), pp. 616–621 7. S. Krig, Interest point detector and feature descriptor survey, in Computer Vision Metrics (Springer, Cham, 2016), pp. 187–246 8. R. Kalliomäki, Real-Time Object Detection for Autonomous Vehicles Using Deep Learning (2019) 9. R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 580–587 10. S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real time object detection with region proposal networks, in Advances in Neural Information Processing Systems (2015), pp. 91–99 11. S. Suwajanakorn, N. Snavely, J.J. Tompson, M. Norouzi, Discovery of latent 3d keypoints via end-to-end geometric reasoning, in Advances in Neural Information Processing Systems (2018), pp. 2059–2070 12. S.D. Khan, Estimating speeds and directions of pedestrians in real-time videos: a solution to road-safety problem, in CEUR Workshop Proceedings (2014), p. 1122 13. G. Prabhakar, B. Kailath, S. Natarajan, R. Kumar, Obstacle detection and classification using deep learning for tracking in high-speed autonomous driving, in 2017 IEEE Region 10 Symposium (TENSYMP), (IEEE, 2017), pp. 1–6 14. Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, N.M. Kwok, A comprehensive performance evaluation of 3D local feature descriptors. Int. J. Comput. Vis. 116(1), 66–89 (2016) 15. K. Wu, Action for Perception: Active Object Recognition and Pose Estimation in Cluttered Environments. Doctoral dissertation (2017) 16. M. Aubry, B.C. Russell, Understanding deep features with computer-generated imagery, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 2875–2883 17. H. Su, C.R. Qi, Y. Li, L.J. Guibas, Render for cnn: viewpoint estimation in images using cnns trained with rendered 3d model views, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 2686–2694 18. K. Židek, P. Lazorík, J. Piteˇl, A. Hošovský, An automated training of deep learning networks by 3D virtual models for object recognition. Symmetry 11(4), 496 (2019) 19. S. Tangruamsub, K. Takada, O. Hasegawa, 3d object recognition using a voting algorithm in a real-world environment, in 2011 IEEE Workshop on Applications of Computer Vision (WACV) (IEEE, 2011), pp. 153–158 20. https://www.mathworks.com/help/visionhdl/ug/fast-corner-detection.html
Chapter 27
Evaluation of Tools Used for 3D Reconstruction of 2D Medical Images Srinikhil Durisetti, Darsani Alapati, Sai Keerthi Vadnala, Keerthana Kotha, G. Ramesh Chandra, and Sathya Govindarajan
1 Introduction Image reconstruction is a process of using raw data to create an image. It is done with the combination of complex computer algorithms, mathematical equations and physics. Image reconstruction is the construction of a 2D or 3D image from dispersed or partial data, such as radiation measurements found in medical imaging studies. It is important to use a mathematical strategy to create a readable, usable image or to refine an image scanner (CT). For instance, image reconstruction can benefit to produce a 3D body image in the order of each camera image. Image reconstruction in CT is a mathematical system that creates tomographic images using X-Ray projection data, which is gathered from unique viewpoints around the patient. The implementation of these tools examines present methods for constructing a 3D object from 2D medical image slices in Digital Imaging and Communications in Medicine (DICOM) format. These are derived from medical measurement devices such as X-ray computed tomography (X-ray CT), magnetic resonance imaging (MRI) and others. Reconstruction of medical images can be done using different types of modalities that are useful to construct models. Medical imaging is divided into several types, with the following being the most common. The computed axial tomography (CAT) scan, which is also known as X-ray computed tomography (CT), is a spiral tomography technology that generates a 2D image of the structures in a portion of the body. After penetrating the item from different angles, a beam of X-rays spins around it and is picked up by sensitive radiation detectors in a CT scan. The information received is then analysed by a computer, S. Durisetti · D. Alapati · S. K. Vadnala · K. Kotha · G. Ramesh Chandra (B) Department of Computer Science Engineering, VNRVJIET, Secunderabad, Telangana, India e-mail: [email protected] S. Govindarajan Developer at FT Publications, Inc, New York, NY, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_27
287
288
S. Durisetti et al.
which creates a detailed image of the structure and its contents using the mathematical principles. These sets of images were reconstructed using a radon transform to project as a single image. PET, also known as positron emission tomography, is used along with computed tomography (PET-CT) and magnetic resonance imaging (PETMRI). MRI is commonly used to create tomographic images of body cross-sections [1, 2]. In order to implement reconstruction of the 2D medical images into 3D images, ray tracing algorithm can be utilized. Ray tracing is a 3D computer graphics technique used to visualize a 3D object in a 2D view. This technique is used for generating images by tracing the light path in the image plane as pixels used [3]. In this technique, the 3D object situated in a world view is visualized on a 2D screen known as a window view based on the position of the camera. Initially, the window view size is defined. Then from each cell of the window view array, a light ray is sent on to the object in the world view and then reflected back. This is done based on the camera position. The intensity values of the 3D object in world view with respect to the camera position are captured in the window view array from the reflected light ray which were stated by Vasiou et al. [3, 4]. Now this 2D window view is visible on the screen. In this way using this technique, a 3D array can be visualized on the screen. Ray tracing implementation is done on the GPU by a simplified abstract stream programming model usually written in C++, which was discussed by Yushkevich et al. [5]. This paper focuses on evaluating various 3D reconstruction tools. For a better understanding and evaluation of the reconstruction tools, several parameters are taken into consideration. Parameters such as data import, data export, metadata, 2D viewing and 3D viewing play a vital role in understanding how each tool is associated with the reconstruction of the 2D medical images. Initially, we must import all of the 2D images, which are made up of a range of devices and file formats. The volume is then rendered, and 2D slices are set using a 3D coordinate system in a visualization methodology. There are a variety of methods that can be used. These methods differ depending on the activities you are attempting to do, such as importing, visualizing, processing and analysing data. Tools such as MITK, 3D Slicer, InVesalius, RadiAnt, Real3d VolViCon, Volume Viewer and ITK-SNAP that consist of their own algorithm are evaluated based on the parameters considered which give an insight on each tool.
2 Motivation 3D medical image processing is the order of the day where a lot of hospitals are using CT/MRI/PET/ultrasonic imaging, etc. One of the major tasks in 3D image processing is 3D image reconstruction. Many 3D reconstruction tools are available off the shelf. The major issue with the existing 3D reconstruction tools is: it does not allow the users to implement their own 3D image processing algorithms. In this regard, this paper studies about a few well-known 3D reconstruction tools, the comparison among these tools, the advantages and disadvantages of them.
27 Evaluation of Tools Used for 3D Reconstruction …
289
3 Tools Available for Reconstruction of 3D Medical Image from a Set of 2D Images This paper discusses few 3D medical image reconstruction tools such as MITK, InVesalius, 3D Slicer, Volume Viewer, ITK-SNAP, Real3d VolViCon and RadiAnt. These tools utilize their own algorithm for the process of reconstruction of a set of 2D images. MITK The Medical Imaging Interaction Toolkit (MITK) is an open-source software project for developing medical image processing applications that is free and adaptable. It can be used to construct applications as a C++ toolkit or application framework. MITK seeks to assist in the creation of cutting-edge medical imaging software that is highly interactive. The high level of integration of ITK and VTK, improved with data management, advanced visualization and interaction functionality in a single framework that is backed by a diverse group of academics and developers, will benefit research institutes. Medical professionals will benefit from MITK and MITK applications if they use the basic functions for research initiatives. InVesalius InVesalius is an open-source software that may be used to reconstruct computer tomography and magnetic resonance images. This programme is mostly used for quick prototyping, training, forensics and medical applications. The major characteristics of this software InVesalius are the ability to input DICOM images, analyse files and export files. InVesalius from then has been developed. InVesalius also supports 3D printing and some other tools. This mainly focuses on: advanced 3D visualization. In so many tools that have the 3D projection capacity, InVesalius is one that uses ray casting technique, and it also uses projection tools like maximum intensity projection (MIP), minimum intensity projection (MIP) and Contour MIDA. InVesalius also include segmentation tools that may be manual or semi-automatic depending on the threshold, growing region and watershed technology as mentioned by Amorim et al. [6]. 3D Slicer Slicer firm developed the 3D Slicer. It is a software tool to display and analyse medical data sets based on VTK. Virtual reality and desktop visualization are both offered. The study includes segmentation, registration and other qualifications. 3D Slicer is a research software, and this platform allows researchers to create and test new methods before distributing these to medical professionals. Many capabilities are available and expandable in languages like C++ and Python. 3D Slicer has a built-in Python console, and it functions as a Jupyter Notebook possessing 3D capabilities. 3D Slicer runs on different operating systems like Linux, Windows and Mac that have the latest versions. Slicer is also a platform for product development. It enables businesses to create a prototype and make the product available to all users. Virtual
290
S. Durisetti et al.
machines and Docker containers are also supported by 3D Slicer. This 3D Slicer program is released under a BSD-style open-source licence that includes the opensource initiative and does not impose any limits on software usage, as discussed by Pieper et al. [7]. Volume Viewer Volume Viewer is an application of MATLAB which helps us in visualizing 3D volume data and 3D labelled volume data. 3D volume data can be viewed as a volume or plane slices. Opacity/intensity of 3D data can be manipulated using a graph like volume rendering component. Volume can be projected as maximum intensity projection or isosurface. Along with these functionalities, Volume Viewer also allows user spatial referencing by enabling users manually set dimensional values or to automatically upscale axes to largest dimension. The application also provides various default alphamaps like linear, MRI, CT-bone, MRI-MIP, CT-MIP for opacity of the volume being visualized so that it makes it easier to adjust volume intensity for a particular type of data. It also provides default colormaps for effective visualization of 3D volume with respect to the type of data and objective. ITK-SNAP ITK-SNAP is a piece of software that allows you to segment structures in 3D medical images. ITK-SNAP is a cross-platform utility that is free and open source. ITKSNAP includes manual delineation and picture navigation, as well as semi-automatic segmentation utilizing active contour approaches. It contains a variety of useful addons in addition to these basic functionalities. It is used to compute the volumes of the segmented structures, as well as the statistics of the image intensity for each structure, and also provides multi-session support which is very useful when working with multiple scans from the same MRI imaging session. The following are some of the most significant benefits of ITK-SNAP: it makes use of cursor for smooth 3D navigation. Manual segmentation is performed in three orthogonal planes at the same time. Based on Qt, it provides a contemporary graphical user interface. It supports a variety of 3D image formats including NIfTI and DICOM. The user interface received the majority of the development work, which emphasizes on interaction and ease of use. Real3d VolViCon Real3d VolViCon reconstructs 3D volume (voxels) and mesh (surfaces) models from a single-volume file or a series of 2D (i.e. DICOM) data. It is a sophisticated program that can recreate computed tomography (CT), magnetic resonance (MR), ultrasound, and X-ray images. This tool enables the export of 3D surfaces or volumes as triangular mesh files, which may subsequently be used to print 3D physical models. It also provides high-quality visualization, a unique solution to extract 3D surfaces from the volume data. It also provides accurate volume, mesh visualization, linear and angular measurements.
27 Evaluation of Tools Used for 3D Reconstruction …
291
It supports the import of volume data (*.vti, *.mhd, *.vol, *.raw), DICOM (with various JPEG, RLE encodings) and 3D mesh models. It supports the rendering of multiple meshes, volumes and DICOMs as well. VTI, MHD and DICOM slices can be exported as BMP, PNG, JPEG or TIF files. It requires no prior knowledge. RadiAnt RadiAnt is a DICOM viewer tool which is used for medical images, and this provides several functionalities. However, it has a very good capacity for reading huge image series. It can load several images at once, which can be toggled with, and since it is compiled in 64-bit mode, it is able to load huge sets of data. The software can show studies that have been obtained from different imaging modalities. RadiAnt includes basic image manipulation and tools used for measurement. Either one or more studies can be viewed using the same window or a different window which can be used for comparing purposes. It can make use of a multiprocessor and multicore system, as well as large amounts of RAM. RadiAnt MPR tool is also used for orthogonal plane reconstruction, as discussed by Badshah et al. [8].
4 Proposed Work The tools discussed in this paper for reconstruction and visualization of 3D medical images do not allow implementation of any new 3D image processing algorithms as per the user’s requirements other than the functionalities defined in the respective software. Hence, there is a need to implement a new tool for 3D medical image reconstruction, visualization and processing. The following are the methodologies for reconstruction, visualization and processing of 3D medical images. Step 1—3D Image Reconstruction 3D image is reconstructed using a stack of order of 2D images which are obtained using various modalities such as CT, MRI, PET and ground penetrating radar (GPR). The 2D images are stored in the form of a 2D matrix, by stacking the results of these 2D images in a 3D cube. The 3D cube is actually represented in the form of a 3D matrix as shown in Fig. 1. The reconstructed 3D medical image is represented in a 3D matrix, where each cell in it is called a voxel. The voxels have intensity value of the 3D object at that particular cell location. The 3D object has to be visualized on the 2D monitor using various computer graphics visualization techniques, which is discussed in step 2. Step 2—3D Image Visualization A 3D medical image is viewed as a Ʀ x Ʀ x Ʀ in a Euclidean space, where Ʀ is the set of integers. To visualize the said 3D images, there are no 3D display technologies as of now in abundant use. Therefore, there is a need to display the 3D image onto a 2D monitor, which results in various 3D image visualization techniques. Visualization of 3D images onto 2D is called volume visualization. The volume visualization
292
S. Durisetti et al.
Fig. 1 Staking of 2D images for reconstructing a 3D medical image
techniques are categorized into volume rendering (VR), surface rendering (SR) and multiplanar reformation (MPR), which were detailed by Qi Zhang et al. [9]. MPR is also called slice-based rendering. This paper used volume rendering for visualization of 3D images. There are many existing volume rendering techniques such as ray casting, splatting, shear warping, texture mapping and maximum intensity projection, which were detailed by Kaufman and Mueller [10]. This paper focuses on visualizing 3D objects onto the 2D surface using ray casting technique proposed by Kruger and Westermann [11]. This technique uses texturing concept which is generally used in game environment creation and rendering using GPU. This technique accelerates GPU rendering using empty space skipping and early ray termination. The technique proposed by Kruger and Westermann is implemented using XNA Game Studio, which is a wrapper for DirectX graphics framework to create games or game engines. To implement the visualization technique, Microsoft.NET environment is used. The programming languages used for rendering the 3D medical images are C# and High-Level Shading Language (HLSL). Figure 2 shows a rendering of 3D medical images onto 2D monitor using ray casting technique. Step 3—3D Image Processing In 2D image processing, the images are processed using 2D mask. In a similar way, 3D image is processed using 3D mask such as 3 × 3 × 3 and 5 × 5 × 5. So, 2D image processing algorithms with slight modification may be applied for 3D image processing also. In general, a 3D digital image is an array of voxels (volume pixels) placed in a 3D grid of Euclidean space. Processing of 3D image happens by convolving original 3D image with a 3D filter/mask. By choosing the different filters, the different operations can be performed on 3D images. Figure 3a shows original 3D medical image, Fig. 3b shows a 7-neighbourhood 3D filter, and Fig. 3c shows edge detected version of 3D medical image by applying 3D filter in Fig. 3b. By using the methodology discussed in step 1 to step 3, an exclusive software named “3D Logical Image Processing System (3DLIPS)” was build using XNA Game Studio, DirectX, HLSL and C#.NET. This software is built to implement
27 Evaluation of Tools Used for 3D Reconstruction …
293
Fig. 2 Sagittal (a) and frontal (b) view of the 3D medical images
Fig. 3 Process of edge detection of 3D medical image using 3D filter
various 3D image processing algorithms such as 3D point detection, 3D edge detection, 3D surface detection, 3D directional textures, 3D thinning and 3D skeletonization. The said 3D image processing algorithms cannot be implemented in the tools such as 3D Slicer, MITK, InVesalius, RadiAnt, Real3d VolViCon, ITK-SNAP and Volume Viewer. Figure 4 shows the screenshot of “3D Logical Image Processing System (3DLIPS)”.
5 Results The various 3D reconstruction tools such as 3D Slicer, MITK, In Vesalius, RadiAnt, Real3d VolViCon, ITK-SNAP and Volume Viewer, which are discussed in Sect. 2, are compared. The various evaluation parameters for comparing the tools are: data import, support, data export, metadata, 2D viewer and 3D viewer. Figure 5 shows
294
S. Durisetti et al.
Fig. 4 Screenshot of “3D logical image processing system (3DLIPS)”
the results of evaluation of various tools listed above. The data import is evaluated based on the facility of importing images, importing a set of images, loading a series of images and directory support. Similarly, data export is evaluated based on support for exporting a series of images or single image. These tools are evaluated based on community support and documentation support. The metadata capability of the tools is also evaluated. 2D viewing capability of the tools is evaluated in the perspective of measurement, windowing, information and annotations. In a similar way, 3D viewing capabilities are evaluated in the perspective of slice scrolling, maximum intensity projection and orthogonal slices. The proposed 3DLIPS system is developed in the framework of cellular logic array processing (CLAP) which is proposed by Rajan [12–18] for processing 2D images. But CLAP is extended to 3D images [19] by the Ph.D. student of Rajan. The various functionalities proposed using CLAP for processing 3D images are: 3D point detection, 3D edge detection, 3D surface detection, 3D directional textures, 3D thinning and 3D skeletonization, which were detailed in [19]. The proposed 3DLIPS system is developed in the framework of cellular logic array processing (CLAP) which is proposed by Rajan [12–18] for processing 2D images. But CLAP is extended to 3D images [19] by the Ph.D. student of Rajan. The various functionalities proposed using CLAP for processing 3D images are: 3D point detection, 3D edge detection, 3D surface detection, 3D directional textures, 3D thinning and 3D skeletonization, which were detailed in [19]. For comparing performance of CLAP against other processing methodologies such as convolution and mathematical morphology, only 3D edge detection algorithm is chosen. The various 3D edge detection algorithms are compared in the framework of mathematical morphology (MM) and CLAP. The algorithms considered for comparison purpose in MM framework are: erosion-based edge detection (EBED), dilation-based edge
27 Evaluation of Tools Used for 3D Reconstruction …
295
Fig. 5 Results of evaluation of 3D reconstructing tools
detection (DBED), opening-based edge detection (OBED) and morphological filterbased edge detection (MFED). These algorithms are compared against cellular logic array processing-based edge detection (CLAP-BED) [19, 20]. The data set considered for comparing the said algorithms is Cerebrix, which is detailed in [19, 20]. Table 1 shows comparison of MM-based and CLAP-based 3D edge detection algorithms. The same comparison is shown in the graph form in Fig. 6.
296
S. Durisetti et al.
Table 1 Comparison of 3D edge detection algorithms for Cerebrix data Algorithm
Number of voxels before processing
Number of voxels after processing
Processing time (in s)
EBED
1,205,998
327,815
0. But there are many possible ways of drawing valid boundaries separating the two classes. The best one among these needs to be selected for a given classification problem. Unoptimized decision boundaries could result in greater misclassifications on new data. A classifier that is equally far from the points of both classes will be good. In SVM, the classifier is to maximize the separation between the points and the decision surface. SVM will ignore only few at outliers in data set and draws a hyperplane. The point closest to the hyperplane can be found, and its distance from the hyperplane can be maximized. This is called a margin. The aim is to choose a separating distance to maximize the margin, i.e. separating distance must be at a maximum distance from all feature vectors of both the classes. Let us consider hyperplane a t X i + b = 1, passing through the set of points of class 1 which are nearest to the separating hyperplane a t X i + b = 0. These points are called support vectors. Support vectors are the data points that the separating up against or the points that are near to the opposite class. So the algorithm implies that only these support vectors are important, whereas training examples are ignorable. There are no feature vectors in between the separating hyperplane and the other two edge hyperplanes. Similarly, consider a t X i + b = −1 passing through support vectors of class 2. All the points classified correctly are given by:
31 Support Vector Machine Optimization …
329
a t X i + b ≥ 1 for yi = +1 and a t X i + b ≤ −1 for yi = −1.
(4)
The unified classification rule thus becomes, yi (a t X i + b) ≥ 1. For support vectors yi a t X i + b = 1. Therefore, the distance between these two hyperplanes a t X i + b = 1 and a t X i + 2 b = −1, margin = a 2. The aim is to maximize the margin. This constrained optimization problem can be framed as: min
a2 Subject to yi a t X i + b ≥ 1 where i = 1, 2, 3, . . . , n 2
(5)
This problem can be converted to unconstrained optimization using the Lagrange method. L=
a2 + αi yi 1 − yi a t X i + b 2 ∀i
(6)
where α i i = 1, 2, 3, …, n are Lagrange multipliers. Solve to get: a=
αi yi X i and
∀i
αi yi = 0
∀i
The Lagrangian is thus written as: L=
i
αi −
1 αi α j yi y j X i , X j 2 i j
and
αi yi = 0
(7)
∀i
The values of a and b need to be learned from the training data to arrive at an optimal separating hyperplane. As we train with new data, the hyperplane shifts itself optimally.
330
L. Sunitha and M. B. Raju
There are several cases where data points of various classes might be linearly inseparable and are randomly distributed. The data points in n-dimensional space are mapped to a higher dimension, where they can be linearly separable. Here, the kernel trick is used similar to the above-discussed optimization problem. The general format of kernel function is K X i , X j = (X i ), X j .
(8)
The Lagrangian is L=
αi −
i
1 αi α j yi y j (X i ), X j , 2 i j
Replace the inner product with the Kernel function to get: L=
i
αi −
1 αi α j yi y j K X i , X j 2 i j
and
αi yi = 0
(9)
∀i
Our final aim is to formulate a kernel function and compare the performance with standard kernel functions like linear, polynomial and RBF. Every kernel function is expected to be continuous, symmetric and has a semidefinite Gram matrix. Consider Taylor series of the function sechx 5x 4 61x 6 x2 + − + ... 2 24 720 ∞ E 2n x 2n = (2n)! n=0
sechx =1 −
(10)
where E n n = 0, 1, 2, …, ∞ are Euler numbers (defined by special Euler polynomials), and E 1 = 1, E 2 = 5, E 3 = 61, E 4 = 1385 … sechxit x j Expanded form is
2n ∞ E 2n xit x j = (2n)! n=0
31 Support Vector Machine Optimization …
331
Fig. 2 Decision boundary with (RBF)
n n n1 ∞ t k ∞ xi1 . . . xi pp x nj11 . . . x j pp xi x j = k! n1! . . . n p ! n1! . . . n p ! k=0 k=0 n =k i
This makes it evident that the considered function can thus be split into components, similarly indicating its application as a kernel function. The above function satisfies all the criteria, and thus we frame our final kernel function as: K (X i , X j ) = sech(γ X it X j + α), where γ and α are free parameters and can be varied by the user.
4 Experimental Results 4.1 Random Data Data points have been generated randomly and were classified using RBF and SHPK to determine the separating hyperplane. The generated points are not linearly separable, and thus we introduce a kernel function. The scatter diagrams of RBF and SHPK can be visualized with Figs. 2 and 3, respectively.
4.2 Data Sets Real-time and synthetic data sets were collected from UCI Repository [12–14]. Experiments on four medical data sets namely diabetes, heart, hepatitis and ILPD (Fig. 4; Tables 1, 2, and 3).
332
L. Sunitha and M. B. Raju
Fig. 3 Decision boundary with (SHPK) 0.4
Linear
0.3 0.2
Polynomial
0.1
RBF
0
ILPD
Heart
Hepas
Diabetes
SHPK
Fig. 4 Misclassification comparision bar graph
Table 1 Data sets Data set
Number of instances
Number of attributes
Description
Diabetes
768
9
Heart
13
303
Based on some test reports patient is having risk or safe
ILPD
19
583
Liver function is examined by various results and identifying patient is in risk or safe
Hepatitis
19
142
Based on physical and laboratory reports predicting whether patient is having hepatitis or not
It is a classification problem patient is having diabetes or not
4.3 Observations Comparison with respect to misclassification. (i)
For the ILPD data set, proposed kernel produces results similar to linear, polynomial and RBF kernels.
31 Support Vector Machine Optimization …
333
Table 2 Misclassification comparision table Data set
Linear
Polynomial
RBF
Proposed (SHPK)
Hyperparameters
ILPD
0.2918
0.2918
0.2918
0.2918
A1pha = 5 Gamma = −1
Heart
0.2757
0.2140
0.2428
0.2510
A1pha = 5 Gamma = −1
Hepatitis
0.3036
0.2589
0.2589
0.2768
A1pha = 5 Gamma = −1
Diabetes
0.2374
0.3073
0.2472
0.2932
A1pha = 6 Gamma = −1
Table 3 Execution time of the model Data set
Linear
Polynomial
RBF
ILPD
17.900552
340.381429
1.769296
2.939561
Heart
3.019197
162.260163
1.561810
4.472207
Hepatitis
4.640411
4.640411
1.649977
2.977480
Diabetes
66.646952
3.256405
18.876729
(ii) (iii) (iv)
461.1348
MyKernel (SHPK)
For the heart data set, SHPK is better than the linear kernel. Polynomial kernel produces the best results among all three. For the hepatitis data set, SHPK is better than the linear kernel, whereas RBF and polynomial kernels are well suited For the diabetes data set, SHPK is better than the polynomial kernel, while the best kernel is linear.
Comparison of execution time in (msec). (i) (ii) (iii) (iv)
The proposed kernel is faster than linear and polynomial for the ILPD data set For the heart data set, SHPK is faster than polynomial. SHPK is faster than linear and polynomial for hepatitis data set The proposed kernel is faster than linear and polynomial kernels for the diabetes data set.
5 Conclusion We have designed a kernel function to improve the classification of nonlinearly separable data sets in the current paper. The proposed novel kernel function has improved the accuracy of the classification and reduced the execution time. The SVM algorithm with SHPK has been implemented on various data sets, collected from the UCI, and results that proved the efficiency of the proposed secant hyperplane kernel.
334
L. Sunitha and M. B. Raju
References 1. T. Evgeniou, M. Pontil, Support vector machines: theory and applications, in Conference: Machine Learning and Its Applications, Advanced Lectures, January 2001 2. E. Goel, E. Abhilasha, Random forest: a review. Int. J. Adv. Res. Comput. Sci. Softw. Eng. Res. 7(1) (2017), ISSN: 22778 3. V. Sucharita, S. Jyothi, P.V. Rao, Comparison of machine learning algorithms for classification of penaeid prawn species. Paper presented at the 3rd international conference on computing for sustainable global development (2016) 4. N. Cristianini, J. Shawe-Taylor, in An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (Cambridge University Press, 2000) 5. V. Vapnik, in The Nature of Statistical Learning Theory (Springer, N.Y., 1995), ISBN: 0-38794559-8 6. V. Vapnik, S. Golowich, A. Smola, Support vector method for function approximation regression estimation and signal processing, in Advances in Neural Information Processing Systems ed. by M. Mozer, M. Jordan, T. Petsche (MIT Press, Cambridge, MA, 1997), pp. 281–287 7. V. Vapnik, Estimation of dependencies based on empirical data, in Empirical Inference Science: Afterword of 2006 (Springer, 2006) 8. D.K. Srivastava, L. Bhambhu, Data classification using support vector machine. J. Theor. Appl. Inf. Technol. 12(1) 9. Y. Tian, Y. Shi, X. Liu, Recent advances on support vector machines research technological and economic development of economy, March (2012) 10. G. Kaur, A. Chhabra, Improved J48 classification algorithm for the prediction of diabetes. Int. J. Comput. Appl. 98(22), 13–17 (2014) 11. J.Ð. Novakovic, Experimental study of using the K-nearest neighbour classifier with filter method, in Conference: Computer Science and Technology, June 2016 12. M. Horný, in Bayesian Networks, Technical Report No. April 5, 18, 2014 13. K. Bache, M. Lichman, UCI machine learning repository (2013) 14. J. Brownlee, Standard machine learning datasets to practice, June 28, 2016, in Weka Machine Learning. Last updated on December 11, 2019
Chapter 32
Analysis and Prediction of Air Pollutant Using Machine Learning Chalumuru Suresh, B. V. Kiranmayee, and Balannolla Sneha
1 Introduction Pollutants discharged into the air, such as chemicals, particles, biological molecules, and other substances that are hazardous to people’s health as well as the environment, are referred to as air pollutants. The combustion of fossil fuels, vehicles, industrialization, and mining operations are the primary sources of air pollution. The two types of pollutants present in air are primary (or anthropogenic) pollutants and secondary pollutants. Primary pollutants cause air pollution directly, whereas secondary pollutants are generated when primary pollutants are mixed together and react in the atmosphere. The most prevalent air pollutants are ozone (O3 ), nitrogen dioxide (NO2 ), carbon monoxide (CO), sulfur dioxide (SO2 ), and particulate matter (PM10 and PM2.5). Mild allergies, such as infections of the throat, nose, and eyes, as well as more significant disorders, such as heart disease, bronchitis, pneumonia, lung illness, and asthma, are all caused by pollution. It is recognized as the world’s leading risk factor for human health. According to a report released earlier this year by the World Health Organization (WHO), air pollution now kills almost 7 million people per year throughout the world. India is one of the most polluted countries in the world, especially in metropolitan areas where industrialization and an increase in the number of vehicles result in the discharge of several toxins. According to Intergovernmental Panel on Climate Change (IPCC), nearly all pollutants present in the air either directly or indirectly are responsible for all the health problems. Air pollution is a huge threat to our world, in addition to its health implications. The C. Suresh · B. V. Kiranmayee · B. Sneha (B) VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India C. Suresh e-mail: [email protected] B. V. Kiranmayee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_32
335
336
C. Suresh et al.
main contributors to the cause of the greenhouse effect, carbon dioxide emissions, are pollution emissions from diverse sources such as automobiles and industry. It also causes the ozone layer to deplete, which shields the Earth from ultraviolet sun rays, as well as acid rain, which harms plants, rivers, soils, and wildlife. As a result, one of our most pressing concerns is air pollution.
2 Related Work In recent years, a number of machine learning algorithms for air pollution prediction have been presented. Some of the works in this area are researched and presented. Niharika et al. [1] have conducted a study on various air quality prediction techniques, focusing primarily on a comprehensive review of existing air quality prediction techniques through soft computing. They presented the main causes of air pollution and the factors which may be responsible for minimizing it. A hybrid soft computing technique based on ANN with fuzzy logic was further developed to predict air pollution for a particular city. Wang et al. [2] have proposed a method to analyze the correlation between particulate matter PM2.5 data and meteorological variables and data was collected in Nagasaki for a year span of time. The spatial distribution for PM2.5 has been obtained, and it shows that PM2.5 affects the western part more severely. Using data processing, the daily distribution and seasonal distribution were analyzed, the association between PM2.5 mass concentrations and meteorological variables was determined using the Spearman analysis and the linear analysis. The results show temperature is negatively correlated to concentration of PM2.5 while precipitation is positively correlated to concentration of PM2.5. Shaban et al. [3] have applied three machine learning techniques. They focused on one-step and multistep forecasting models for nitrogen dioxide, ozone, and sulfur dioxide at ground level. Support vector machine (SVM), ANN, and M5P model trees are the machine learning algorithms used. Univariate and multivariate modeling are the two types of models employed. The findings imply that in multivariate modeling, using more characteristics with the M5P algorithm yields superior forecasting results than SVM and ANN. Li et al. [4] implemented a deep learning technique for predicting air quality which inherently takes into account both spatial and temporal correlations. They proposed a deep learning technique to decrease the time series analysis error rate. They compared a neural network model with the models autoregression moving average (ARMA) and support vector regression (SVR). The algorithm’s processing time is not discussed, despite the greater precision. Ghoneim et al. [5] proposed a method for estimating concentrations of O3 in smart cites that was based on a feed-forward neural network which is used in a deep learning technique. The model is used to learn features of the ozone level and is trained using a grid search technique. The learning algorithms of the approaches, support vector machines (SVM), and ANN, were compared. The findings reveal that
32 Analysis and Prediction of Air Pollutant Using Machine Learning
337
deep neural network learning algorithms perform well when it comes to precisely calculating the value of air pollution. They considered only one pollutant and used a linear method to solve the problem. No mention is made of how real-time data would be kept up to date. The findings of the study could be utilized to improve the accuracy of pollution predictions at the ozone level. Asgari et al. [6] have used logistic regression to determine whether or not a sample of data is contaminated. Using the autoregression technique, future PM2.5 concentration values are predicted based on previous PM2.5 readings. Knowing the PM2.5 level in real time allows us to keep it below the hazardous limit. This approach forecasts PM2.5 levels and determines air quality at a specific location based on a dataset of daily atmospheric conditions. It assists both regular people and meteorologists in detecting and forecasting pollution levels, as well as taking appropriate measures to combat it. It is a poor time series classifier and the data size used in this is limited. Patra et al. [7] used Apache Spark on a Hadoop cluster to investigate urban pollution. They analyzed data in Tehran using Apache Spark as an underlying framework to form a cluster and compared Naive Bayes algorithm prediction accuracy and logistic regression. For identifying unknown air quality classes, they discovered that Naive Bayes predicted data more accurately than logistical regression, and they reported good results in terms of processing time for Apache Spark. Aditya et al. [8] have implemented time series forecasting of pollutant concentrations in the air it was assessed using artificial intelligence systems. The algorithms used are ANN, SVM as well as conventional ARIMA model. Input units, hidden processing units, initial weights, and biases are used to describe the neural network. It analyzes the performance of the algorithms. The results were assessed for RMSE. The ANN technique is shown to be the best configuration among the proposed combinations for both CO and NO2 predictions. Raturi et al. [9] to improve prediction accuracy, MLP, feed-forward ANN model has been presented. They used ANN multilayer perceptron model for forecasting, based on the trained model and new values, which will be provided to get the next year, next month, next day, air quality index prediction. The parameters like ozone, NO2 , PM, SO2 are taken into the consideration. Finally, the end result shows that the ANN model has better accuracy. Zhang et al. [10], using the LightGBM model to forecast PM2.5 concentrations, have presented an improved method for predicting air quality. They developed a way for using prediction data as one of the data sources for utilizing the model to predict air quality and to solve the challenge of processing high-dimensional large-scale data to improve forecast accuracy. The experimental results reveal that the suggested method outperforms previous systems, demonstrating the benefit of combining forecasting data and constructing a high-dimensional statistical analysis. Soh et al. [11] intend to anticipate air quality for up to 48 h using a mixture of different neural networks. To extract spatial–temporal correlations, they used a convolutional neural network, ANN, and long-term memory. The suggested predictive model considers a variety of meteorological data from the past several hours. The model incorporates trends from numerous sites, which are determined through
338
C. Suresh et al.
correlations between surrounding locations and, in the temporal domain, related places. All of the baselines and comparison models studied were outperformed by the proposed methods.
3 Proposed System Dealing with air pollution is a serious environmental issue in metropolitan areas, with particulate matter being the most harmful air pollutant present, affecting all living beings more than other pollutants. In order to overcome these problems, existing researchers has used different learning techniques to predict pollution. In current work, proposed pollution analysis and PM2.5 concentration prediction are using machine learning approaches in this work and give a comparative assessment of various strategies to establish the optimal model for accurately predicting air quality. Proposed work consists of data collection, data preprocessing, exploratory data analytics, and performance analysis.
4 System Architecture Architecture defines the design of the computerized system in order to fulfill certain requirements (see Fig. 1).
4.1 Dataset The UCI Machine Learning repository [12] was used to obtain data for this experiment. This data collection contains hourly data on air pollutants collected from 12 national air quality monitoring locations. The time period runs from March 1, 2013 to February 28, 2017. The dataset contains 35,064 records with 17 parameters. The parameters included are date, temperature, pressure, rain, meteorological variables, and pollutants, PM2.5, PM10, SO2 , NO2 , CO recorded from a specific city. Dataset parameters indicated are year, month, day, hour, PM2.5 (µg/m3 ), PM10 (g/m3 ), SO2 (µg/m3 ), NO2 (µg/m3 ), O3 (µg/m), dew point (°C), temperature (°C), pressure (hPa), WD: wind direction, WSPM: hourly precipitation, and WS: wind speed (m/s) (mm).
32 Analysis and Prediction of Air Pollutant Using Machine Learning
339
Fig. 1 System architecture
4.2 Preprocessing Data Data preprocessing phase covers all activities from the initial raw data relating to the development of the final dataset. Data preparation tasks are likely to be repeated multiple times and not in any particular order. Data cleaning: Data is never perfect or clean, missing values or even outliers are always present. NA denotes the missing values inside the dataset. The numerical values in the dataset (PM2.5, PM10, SO2 , NO2 , CO, TEMP, DEWP, PRES) are replaced by the mean of the respective features, and WD is replaced with frequent value appearing. The pollutants PM2.5, PM10, NO2 , O3 air quality with values below zero and TEMP between −94 and 62, and DEWP between −92 and 37 are considered as outliers. Correlation matrix: Visualization of relationships is between the features, for better understanding of the data by means of visualization. In this mainly study, the behavior of the amount of PM2.5 concentration in the air and the relationship between other features are present in the dataset. Figure 2 displays the correlation matrix of the variables. The value of r lies between +1 and –1. Negative values represent the negative correlation between the parameters, and positive values represent the positive correlation between them. Because primarily interested in the behavior of PM2.5 concentration in the air and the interaction between other variables in the dataset, the following inferences are drawn from the above matrix, as shown in Table 1.
340
C. Suresh et al.
Fig. 2 Heatmap display correlation matrix of variables
Table 1 Inference from the correlation matrix
Positive correlation
Negative correlation
PM10, SO2 , NO2 , CO, DEWP
O3 , TEMP, PRES, RAIN, WSMP
In addition, the high correlation between other features reveals dependence on one another. Dependent features can be predicted using other features and can subsequently be removed from modeling to reduce the complexity of the model and prediction costs.
4.3 Data Transformation from Splitting Data The dataset is divided into two subsets in this phase: training and test sets. Due to the size of the dataset, an 80–20 split is chosen to avoid overfitting or underfitting. The more training data a data scientist utilizes, the better the potential model performs, and testing findings lead to improved model performance and generalization capabilities. The dataset has been split into predictor variables and outcome variables, which is the
32 Analysis and Prediction of Air Pollutant Using Machine Learning
341
observed value of PM2.5 from the input features, and the dataset has been separated into training and testing data using the train test split function in Scikit learn.
4.4 Modeling It trains models throughout this stage to see which one gives the most accurate air pollution predictions. After preprocessing and splitting the collected data into two subsets, proceed with a model training. Two machine learning models are used— random forest regressor and ANN multilayer perceptron—to predict the air pollutant level and conducted experiments in Jupyter Notebook and used the UCI dataset to estimate pollution levels. The MAE, R-squared, and RMSE were utilized as assessment criteria to compare the regression models. The model with lower error rate is considered as best-fit model for predicting air pollutant level.
5 Experimental Results 5.1 Random Forest Regressor (RFR) The first model used for predicting the PM2.5 concentration is RFR. The necessary features from the dataset are extracted and fed into the model. It is imported from Scikit-learn to build random forest regression. By running the predict function on the test input, the model’s predicted values are generated. The graph (see Fig. 3) shows the distance between predicted and measured values of the PM2.5 levels for the RFR.
5.2 ANN Multilayer Perceptron The second model used for predicting the PM2.5 concentration is ANN multilayer perceptron. MLP regressor is imported from Scikit-learn for implementing. MLP class regressor implements multilayer perceptron with no activation function in the output layer and trains via backpropagation. MLP consists of multiple layers with same hidden layer size, and every hidden layer consists of a ReLU activation function to avoid dying ReLU problem and to speed up the training. The MLP is trained with the optimization method, Adam for weight optimization as it works well on relatively large datasets. The batch size is implemented to speed up the training and increase generalization. Most of the parameters are included in the hyperparameter search.
342
C. Suresh et al.
Fig. 3 Scatter plot predicted vs measured values using RFR
See Fig. 4. The X-axis in the scatter figure below represents anticipated PM2.5 values, while the Y-axis shows measured PM2.5 levels. The positive and linear association between the predicted and measured PM2.5 values can be seen in the graph, observed an outlier point, a tree that has measured values much larger than the others. Fig. 4 Scatterplot for neural network predicted versus measured
32 Analysis and Prediction of Air Pollutant Using Machine Learning Table 2 A comparison of performance
343
Metric
Random forest regressor
ANN multilayer perceptron
R-squared
91.3
83.7
Root mean square error
23.1
32.9
Mean absolute error
15.4
21.2
This plot makes clear that the highest value the model can predict is 600. That means the random forest cannot explore new examples that have a value higher than that is seen in the train data.
5.3 Comparing of ML Methods Here we present the results obtained from the two machine learning algorithms used in the work. The models examined in this research are compared in terms of performance based on evaluation criteria (see Table 2) and their performance indicators are displayed in the form of bar charts as shown in Fig. 5. X-axis represents the models that used ANN multilayer perceptron and random forest regressor. The values of the performance measures RMSE, MAE, and Rsquared are represented on the Y-axis. In comparison to the ANN multilayer perceptron, the random forest regressor has a 23.1% RMSE and a 15.4% MAE. When Fig. 5 Performance comparison of RFR and NN MLPR model
344
C. Suresh et al.
compared to MLP, the random forest regressor is the best methodology for predicting pollution for datasets of various sizes and characteristics.
6 Conclusion In this work, using two learning techniques analyzed and compared the problem of air pollution prediction. The techniques used are random forest regressor and ANN multilayer perceptron regressor. Through this work, the PM2.5 levels are predicted using these two techniques and the performance was successfully evaluated based on error rate. The result shows that RMSE of random forest regressor is 23.1% and MAE is 15.4% whereas RMSE of ANN multilayer perceptron is 32.9% and MAE is 21.2%. Random forest regressor has a least error rate compared to ANN multilayer perceptron regressor. Hence, random forest regressor is the best technique to predict the air pollution data than ANN multilayer perceptron regressor. The future work may involve investigating the other factors effecting the air pollution. Further research can be done on geographic information systems in order to organize geographic data in order to choose and use specific locations to assess air quality and to construct a real-time API that automatically predicts future values.
References 1. V.M. Niharika, P.S. Rao, A survey on air quality forecasting techniques. Int. J. Comput. Sci. Inf. Technol. 5(1), 103–107 (2014) 2. J. Wang, S. Ogawa, Effects of meteorological conditions on PM2.5 concentrations in Nagasaki, Japan. Int. J. Environ. Res. Public Health 12(8), 9089–9101 (2015) 3. K.B. Shaban, A. Kadri, E. Rezk, Urban air pollution monitoring system with forecasting models. IEEE Sens. J. 16(8), 2598–2606 (2016) 4. X. Li, L. Peng, Y. Hu, J. Shao, T. Chi, Deep learning architecture for air quality predictions. Environ. Sci. Pollut. Res. 23(22), 22408–22417 5. O.A. Ghoneim, H. Doreswamy, B.R. Manjunatha, Forecasting of ozone concentration in smart city using deep learning, in Proceedings of International Conference on Advances in Computing, Communications and Informatics (ICACCI) (2017), pp. 1320–1326 6. M. Asgari, M. Farnaghi, Z. Ghaemi, Predictive mapping of urban air pollution using apache spark on a hadoop cluster, in Proceedings of International Conference on Cloud Big Data Computing (2017), pp. 89–93 7. S.R. Patra, Time series forecasting of air pollutant concentration levels using machine learning. Adv. Comput. Sci. Inf. Technol. 4(5) (2017). p-ISSN: 2393-9907; e-ISSN: 2393-9915 8. CR.. Aditya, C.R. Deshmukh, D.K. Nayana, Detection and prediction of air pollution using machine learning models. IJETT 59(4) (2018) 9. R. Raturi, J.R. Prasad, Recognition of future air quality index using regression and artificial neural network. IRJET 5(2018). e-ISSN: 2395-0056, p-ISSN: 2395-0072 10. Y. Zhang, Y. Wang, M. Gao, Q. Ma, J. Zhao, R. Zhang, G. Wang, L. Huang, A predictive data feature exploration-based air quality prediction approach. IEEE. https://doi.org/10.1109/ACC ESS.2019.2897754
32 Analysis and Prediction of Air Pollutant Using Machine Learning
345
11. P.W. Soh, J.W. Chang, J.W. Huang, Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations. IEEE. https://doi.org/10.1109/ACCESS. 2018.2849820 12. A.K. Bhavana, C. Suresh, B.V. Kiranmayee, K.S. Kumar, Prediction of epidemic outbreaks in specific region using machine learning. Int. J. Inno. Technol. Exp. Eng. (IJITEE) 9(4) (2020) 13. S. Vasundhara, B.V. Kiranmayee, C. Suresh, Machine learning approach for breast cancer prediction. Int. J. Recent Technol. Eng. (IJRTE) 8(1) (2019) 14. Dataset URL, https://archive.ics.uci.edu/ml/datasets/Air+Quality
Chapter 33
Real-Time Face Mask Detection Using Machine Learning Algorithm Bhagavathula Pushyami, C. N. Sujatha, Bonthala Sanjana, and Narra Karthik
1 Introduction Our country which has the population of around 1.3 billion is facing a rough patch after coronavirus (COVID-19) outbreak. To reduce the speed of COVID-19, the CDC suggests all the people to wear a mask to prevent serious escalation of the virus leaving out few exceptions. Masks are simple barriers to help obstruct your respiratory problems from reaching others, and also many studies show that masks curb the droplets on your mouth and nose when worn on your face. Initially during the pandemic, it was not mandatory to wear masks in all the countries. But now with the increased strength of this virus it has become mandatory in our country since midJuly-2019. In our paper, we are going to describe about the facemask detection model whose training is done using classic deep learning algorithm (CNN) and detection and classification using computer vision (cascade classifiers and OpenCV) technologies. Deep learning algorithms effectively mimic the human brain structure. The sequence of ideas allows the computer to learn concepts easily constructing layers above each other. Therefore, we call this method of in-depth AI learning. The neural networks recognize the correlation between certain relevant features and possible outcomes— they form a link between the character features and what those features represent. The neural networks serve as a function approximate, mapping the input into multiple visual functions, to achieve common intelligence, can be integrated with other AI methods to perform complex tasks. The model is implemented through computer vision. The major libraries that we used are OpenCV which is an open-source machine learning as well as computer vision software. It is used to run an infinite loop to run the webcam. Cascade classifier is also defined using OpenCV. Keras is neural network library used to build sequential model with TensorFlow. It is open-source lib for a number of various tasks in ML. B. Pushyami (B) · C. N. Sujatha · B. Sanjana · N. Karthik ECE, Sreenidhi Institute of Science and Technology, Ghatkear, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_33
347
348
B. Pushyami et al.
2 Literature Survey Batagelj et al. focus on face mask detection in real-time monitoring problems. They introduced a new dataset and conducted an experimental study to examine the techniques of recognition that detect whether the mask is properly worn and the use of the existing models that detect face mask. The detection procedure includes a generative component that is a stable model, Retina Face. Moreover, the recognition performance was not found to be largely affected by the selection of model architecture. Because only the presence of face masks is detected in the images and not their placement on the face, existing face mask detection models do not have a large value for real-life applications. They also integrated the model in real time too and are also trying to increase the dataset variation [1, 2]. Vinitha et al. give the model in their work is an assimilation of deep learning and classical machine learning and deep learning techniques that uses OpenCV, TensorFlow, and Keras using sequential model. The second phase is the application of the facemask detector phase. Initially load the model of the face mask classifier from the disk. They used deep transfer feature extraction and integrated it with three machine learning algorithms. They did comparisons among these algorithms to achieve high accuracy and spend less time on the training and acquisition processes. The training is done using the deep learning model using MobileNetV2, and classification is done using the models in OpenCV [3–5]. Das et al. proposed method that includes a cascade classifier and a pretrained CNN. It consists of two 2D convolution layers which are connected to the dense neuron layers. The proposed model in this case has the model creation and implementation tasks to do. Dataset is converted to grayscale images from RGB images and then resized to 100 × 100 so that proper feature extraction happens. Then the image is normalized and converted to a four-dimensional array. This process is done for all the images in the dataset. Then CNN model is built adding a convolution layer with 200 filters, second convolution layer with 100 filters. A flatten layer is inserted to the network classifier. Dense layers are added of 64 neurons, and after the addition of final dense layer, two outputs for two categories are created. The result and analysis consist of accuracy and loss graphs of testing and training sets along with the total epochs applied [6, 7]. Prime focus of researchers in the current pandemic is to offer suggestions in the current pandemic situation. The proposed method improves the generalization. Deep learning method and quantization technique are employed to recognize the masked faces. In the preprocessing and the cropping filter steps, the images in the dataset are cropped around the face so that detection stage can be covered and localization can be done where around 68 features can be extracted. Filters are applied and translation takes place in order to extract the non-mask region. VGG-16 CNN decrypter is used to extract deep features. 13 convolution layers, 5 max pooling layers, and 3 dense layers are there, whose actual weight is 16 layers with 10 cross-validations. Feature maps are considered at the last convolution layer which is then followed by the quantization stage. To obtain the similarity between extracted features and code words, RBF
33 Real-Time Face Mask Detection Using Machine Learning Algorithm
349
kernel is applied. Radial basis function layer contains two sublayers—RBF layer and quantization layer—which collect RBF neurons and produce a histogram of global quantized feature vectors. The vector will be operated for classification. A multilayer perceptron classifier takes in the global histogram and classifies the stage for each of the testaments to get an identity. Cross-validation strategy is also applied here. MLP uses sigmoid function, associated weights, and test occurrences as input values and maps the results to an output [8]. Singh et al. paper gives information about “Face mask detection using YOLOv3 and faster R-CNN models: COVID-19 environment.” Most advanced face mask detection approaches, YOLOv3 and faster RCNN—are developed using deep learning. Fast RCNN is useful as it is precise for object detection but YOLO algorithm is preferred for real world use as it is a single-shot detector. The authors trained the model using CNN architectures like ResNet. The weights of the pretrained model are given to YOLOv3 and then fast RCNN architecture. According to the experimental results, fast RCNN was slightly better [9]. Kanag Suba Raja Subramaniam paper gives information about “Face Recognition using Haar—Cascade Classifier for Criminal Identification.” The proposed technique deploys two kinds of images: input images and live stream images. They both undergo four basic procedures which are face acquisition, preprocessing, face detection (Haar cascade classifier) with AdaBoost which is a boosting algorithm combining weak classifiers along with reducing the training error and generalized error and feature extraction (linear binary pattern algorithm) that computes LBP values which are stored in the database while processing an input image. Finally, comparison of the values in the database and the values processed while live streaming takes place and humans face are detected as known or unknown [10].
3 Proposed Method In this section, building the model, face mask classification, and detection methods used are described. In the first stage, the model is built. Data is modified, increased to ensure the possibilities and then trained and tested, whereas in the second stage, the complete focus is on face detection and classification whether a person is having his mask on or not. The steps followed.
3.1 Training Model In Fig. 1, the block diagram of the first part of the face detection that is training of the dataset and creating a model is showed. The brief description of the process as shown in the block diagram is follows. Datasets visualization: The total count of images in our datasets is visualized. There are two classes of images labeled “with_mask” and “without_mask.” Different
350
B. Pushyami et al.
Fig. 1 Block diagram of training model
features and variations in the images of the datasets help improve the precision and accuracy of the output. Data augmentation: In this step, dataset is increased by augmenting original images such as flipping, rotating, inclining each of the images will help to cover all the possible variations that a single picture can bring in our dataset. Hence, after data augmentation, dataset is almost doubled. Data splitting: The data is split into training and validation data. The hyperparameters of the model become efficient with validation data as it estimates the accuracy and prediction error for these sets of data. Building model: A sequential convolutional neural network model is developed with seven layers. It is applied to extract the features and build model to start the training. The following steps depicted are followed to develop the model for pretraining: • Neural networks are initialized using sequential. • Convolution2D: It extracts features from training images. The spatial relationship of pixels is built by learning the image features using 3 × 3 kernel, i.e., small squares of input data. This uses 100 filters each time. This is followed by rectified linear unit (ReLU) activation function, introduces nonlinearity, and decreases the gradient loss. • Max pooling: The images are then downsampled over the input-defined pool size (2 × 2) using maxpooling2D. • The pooled feature map is converted to single feature layer using Flatten, and this is then passed to fully connected layer. • Dropout refers to the discharge of units (hidden and visible) from the neural network to eliminate overfitting. • The fully connected layer is then added to the neural network using dense function in Keras. • CNN is compiled using compiled functions which require three factors—optimizer, loss function, and metrics of performance. The optimizer algorithms alter the neural network attributes. The modification of weights and learning rates reduces the losses. Adam optimizer is used which can handle sparse gradients or noisy problems.
33 Real-Time Face Mask Detection Using Machine Learning Algorithm
351
Pretraining model: In order to prevent from overfitting, preprocessing of the images is done. It is known as image augmentation. Keras utility used for this purpose is ImageDataGenerator. In the end, the model has to fit the training dataset and its accuracy and loss performance is tested. fit_generator function on the classifier object is called to achieve this. This function contains four arguments: training model, validation data, epoch number, and call back function. Checkpoints are used to store the intermediate internal state of the model.
3.2 Testing Model The performance of the fully trained model can be checked by giving unknown data samples. The testing data is converted into batches of data and then converted to testing model which helps in detection of images as shown in above Fig. 2. After the training, a model is generated and then implemented in PyCharm. The face is detected in each frame (still images) in the entire video. The probabilities of the results are labeled as [“0” as “without_mask” and “1” as “with_mask”]. Haar cascade classifier is an algorithm that identifies frontal faces in an image or video and extract its characteristics. Several pretrained Haar cascade models are present in OpenCV that are saved in the form of XML files. After loading the classifier, the webcam will start using a simple OpenCV one-liner code video_capture = cv2.VideoCapture(0).
4 Results In this section, two different sets are chosen for training—1316 images and 7699 images. In the two datasets, 80% of the training dataset is used for training and remaining 20% is used as validation purpose. For testing, datasets of 186 images and 194 images are used for two different training datasets, respectively. Model with each dataset is trained with different number of epochs. By comparing different combinations of the dataset and epochs, the maximum accuracy and minimum loss score are estimated.
Fig. 2 Block diagram of testing model
352
B. Pushyami et al.
4.1 Dataset with 2000 Images Initially the dataset of 2000 images is trained which includes all different sizes, shapes, and angles of faces in Google Colab Notebook. In this dataset, 1315 images is trained and 186 images for testing. The horizontal axis of the graph represents the number of epochs and vertical axis the accuracy of the dataset in Fig. 3a. The time needed to run 10 epochs for this particular dataset is 1652 s with each epoch approximately at 165 s. The accuracy model for the training and validation data moves in a range 0.8–0.97, which is quite inconsistent from first epoch to tenth epoch. In Fig. 3b which is a plot for training and validation loss versus epoch number, the loss value decreases. Meanwhile, the loss value of the model for the training and validation dataset is seen in the range of 0.05–0.40. As a result, the test score of the above model is 0.09 and the final accuracy of the generated model is 96.4%. The batch count of epochs is increased to check the accuracy of the testing realtime input. It is observed that the time needed to run 20 epochs is 4078 s with an average of 195 s for each epoch. The accuracy of the test data with increasing epoch has the same range as that of 10 epochs which is 0.84 to 0.98 observed in Fig. 4a. The range of the loss value of the same model is seen in between 0.05 and 0.35, as shown in Fig. 4b. As a result, the test score of the above model is 0.116 and the test
Fig. 3 a Accuracy of the training and validation data and b training and validation loss with 10 epochs
Fig. 4 a Training and validation data accuracy and b training and validation data loss with 20 epochs
33 Real-Time Face Mask Detection Using Machine Learning Algorithm
353
Fig. 5 a Training and validation accuracy and b training and validation loss of 50 epochs
Table 1 Comparison of training the data at different epochs
Number of epochs Execution time (in Accuracy (%) Loss seconds) 10
1652
96.3
20
4078
95
0.097 0.115
50
8150
96.2
0.261
accuracy of the generated model is 95.2%. This shows that there is a slight variance in accuracy and loss values with decrease in performance. The same dataset is now trained with 50 epochs. 50 epochs took 8150 s to run the same dataset with each epoch 163 s. From this it is observed that the time has doubled approximately with the previous set of epochs that is 20 epochs. In Fig. 5a, the accuracy graph of the training dataset is almost closer to the validation dataset. The range of the accuracy model is in between 0.8 and 1.0. Similarly, even for the loss value, the training and the validation values in the graph were almost varying closely in Fig. 5b. In this model of the 50 epochs, the loss value range has been increased to 0.0–0.4 comparatively. As a result, the test score for 50 epochs is increased to 0.26 and the test accuracy has also been varied to 96.2%. From Table 1, it is observed that there was a slight decrease in accuracy and gradual increase in loss as number of epochs changed from 10 to 50. This is due to every dataset that has a threshold point where the number of epochs that increases the validation accuracy may decrease. This problem is due to “underfitting” of the training data, overly training the small amounts of data, resulting in high bias. This can be avoided by extracting more number of features from the dataset.
4.2 Dataset with 7699 Images Another dataset of 7699 Images is trained and tested the accuracy. This includes 3724 of “with mask” dataset and 3828 of “without mask” dataset. In this dataset, 7505 images are trained and 194 images for testing. The optimal number of epochs
354
B. Pushyami et al.
Fig. 6 a Training and validation accuracy and b training and validation loss for 10 epochs with dataset of 7000 images
to train most of the datasets is 10 epochs. The time needed to run 10 epochs for this particular dataset is 8773 s. In Fig. 6a, a narrow range of accuracy values of training and validation dataset is seen in between 0.86 and 0.94. The graph of training data increased gradually from 1 to 10 epochs whereas the graph of validation has not been consistent. In Fig. 6b, the loss value is seen in the range of 0.15–0.35. The validation data had better results than training data with a difference of loss value 0.025. The resultant loss value of the testing model is 0.097 and testing accuracy is 96.3%. The time needed to run 25 epochs for this particular dataset is 30387 s. In Fig. 7a, the accuracy graph of the training dataset is almost closer to the validation dataset. The range of the accuracy model is in between 0.86 and 0.98. Similarly, even the loss value of the training and the validation data was almost varying closely. This can be observed in Fig. 7b. The loss value range has been increased to 0.05–0.35 comparatively. As a result, the test score for 50 epochs is increased to 0.26 and the test accuracy has also been varied to 96.2%. From Table 2, it is observed that as the number of epochs increased the accuracy also increased along with the decrease in the loss value. This shows that the dataset has been trained efficiently although there is an increase in the runtime. Two types of datasets are used to check the loss and accuracy of the training. In Table 3, the various parameters of the two datasets are compared. The training
Fig. 7 a Training and validation accuracy and b training and validation loss for 25 epochs of 7000 dataset
33 Real-Time Face Mask Detection Using Machine Learning Algorithm
355
Table 2 Comparison of training the data at different epochs for 7699 images Number of epochs
Accuracy (%)
Loss
10
Execution time of data (in seconds) 8773
96.4
0.097
25
30,387
97.9
0.036
Table 3 Overall comparison Datasets used for training
1315
7505
Datasets used for testing
186
194
Accuracy by increasing epochs
Decreases
Increases
Loss value by increasing epochs
Increases
Decreases
Error which can affect the accuracy
Underfitting
No error
of 2000 images yielded the accuracies of 96.3%, 95%, 96.2% with 10, 20, and 50 epochs, respectively, and the losses of 0.097, 0.115, 0.261 with 10, 20, and 50 epochs, respectively. This variation in accuracies and losses is due to “underfitting” which indicates an under-sized dataset and can be avoided by increasing the number of features. But when the dataset is increased to 7699 images, this yielded an accuracy of 96.4%, 97.9% and losses of 0.097, 0.036 with 10 and 25 epochs, respectively. Increased dataset size and increased epoch numbers result in smoother training and validation accuracy–loss curves. Therefore, the model is built training 7699 images in 25 epochs which resulted in the maximum testing data accuracy of 97.94%. The simulated results are shown in Fig. 8 which clearly displays faces with and without mask accurately in the frontal face and also detects the partially covered face. This model detects the face without any mask even from long distances accurately. It can detect the face tilted up to an angle of 45° on either directions.
5 Conclusion As the COVID-19 virus is increasing in our vicinity, using masks is mandatory and for compulsion a real-time face mask detection model is created in this proposal. There are two major phases. These phases include face classification and face mask detection. The backbone of the face masks classification is the convolutional neural networks that has different layers like convolution, max pooling, dense, and flatten. Adam optimizer is used for better training results here. There are two types of datasets to check the losses and accuracies of the training. The training of 2000 images dataset yielded the accuracies of 96.3%, 95%, 96.2% with 10, 20, and 50 epochs, respectively, and the losses of 0.097, 0.115, 0.261 with 10, 20, and 50 epochs, respectively. This variation in accuracies and losses is due to “underfitting” which indicates an undersized dataset. Hence, 7699 images are used, which yielded an accuracy of 96.4%, 97.9% and losses of 0.097, 0.036 with 10 and 25 epochs, respectively. Increased
356
B. Pushyami et al.
Fig. 8 Real-time detection a person is detected wearing a mask completely (Face is bounded with green box), b person is detected without wearing a mask (Face is bounded with a red box), c person is detected wearing a mask incompletely (Face is bounded with red box), and d multiple people are detected wearing a mask (Each face is bound with the respective box)
dataset size and increased epoch number resulted in smoother training and validation accuracy–loss vs epoch number curves. Therefore, the model training is done using 7699 Images in 25 epochs which resulted in the testing data accuracy of 97.94%. To ensure that proper safety measures are followed by the citizens in crowded areas like airports, railway stations, and marketplaces, authorities can employ this model. IoT can be implemented with deep learning CNN in the future, so that when a person without a mask is detected, the model will send an alert. Raspberry Pi can be also integrated with this model and data can be stored in the server and utilized if needed anytime. Interfacing the training model or improvised training for this model can also be done to increase the accuracy using the emerging techniques.
33 Real-Time Face Mask Detection Using Machine Learning Algorithm
357
References 1. B. Batagelj, P. Peer, V. Štruc, S. Dobrišek, How to correctly detect face-masks for COVID-19 from visual information?. Appl. Sci. 11(5) (2021) 2. C. Gupta, N.S. Gill, Coronamask: a face mask detector for real-time data. Int. J. Adv. Trends Comput. Sci. Eng. 9(4), 5624–5630 (2020) 3. S. Pooja, S. Preeti, Face mask detection using AI, in Predictive and Preventive Measures for Covid-19 Pandemic (2021) 4. C.M. Basha, B.N. Pravallika, E.B. Shankar, An efficient face mask detector with PyTorch and deep learning. EAI Endorsed Trans. Pervas. Health Technol.7(25) (Koneru Lakshmaiah Educational Foundation, Guntur, Andhra Pradesh, India, 2020) 5. V. Vinitha, V. Velantina, Covid-19 facemask detection with deep learning and computer vision. Int. Res. J. Eng. Technol. (IRJET) 7, 1–6 (2020) 6. A. Das, M.W. Ansari, R. Basak, Covid-19 face mask detection using TensorFlow, Keras and OpenCV, in IEEE 17th India Council International Conference (INDICON), December (2020) 7. T. Subhamastan Rao, S. Anjali Devi, P. Dileep, M. Sitha Ram, A novel approach to detect face mask to control Covid using deep learning. Eur. J. Mol. Clin. Med. 7(6), 658–668 (2020) 8. C. Jagadeeswari, M.U. Theja, Performance evaluation of intelligent face mask detection system with various deep learning classifiers. Int. J. Adv. Sci. Technol. 29(11), 3074–3078 (2020) 9. S. Singh, U. Ahuja, M. Kumar, K. Kumar, M. Sachdeva, Face mask detection using YOLOv3 and faster R-CNN models: COVID-19 environment. Multimedia Tools Appl., 19753–19768 (2020) 10. Y. Kortli, M. Jridi, A. Al Falou, M. Atri, Face recognition systems: a survey. Biometric Syst. 20(2), 1–10 (2020)
Chapter 34
Predicting the Potentially Hazardous Asteroid to Earth Using Machine Learning Kaveti Upender, Tammali Sai Krishna, N. Pothanna, and P. V. Siva Kumar
1 Introduction Space contains countless amounts of objects that are formed by various sources like remnants from the formation of planets, debris resulting from the collisions of planets. These objects include asteroids, comets, meteors, etc. These objects have random trajectories and may interfere with the orbits of other major planets including the Earth. One such object which has the potential to be harmful to Earth is asteroids. These asteroids have sizes too small to be included in planets and these are the leftovers from the formation of the solar system. The asteroids that could approach close to Earth and near-Earth orbits are termed as near-Earth asteroids (NEAs). All the NEAs do not cross Earth’s orbit. There are four near-Earth orbit classes that are categorized based on their distance from the Sun. These orbits include Aten, Apollo, Amor, and IEOs. Aten (ATE) and IEO orbital classes have a semi-major axis less than 1 a.u. and an orbital period less than 1 year as shown i Figs. 1 and 2, respectively. Apollo (APO) has a semi-major axis greater than 1 a.u. and an orbital period longer than 1 year as shown in Fig. 3. Amor (AMO) is the only near-Earth orbit K. Upender (B) · T. S. Krishna Department of Mechanical Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India N. Pothanna Department of Humanities and Sciences, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India e-mail: [email protected] P. V. S. Kumar Department of Computer Science and Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_34
359
360
K. Upender et al.
Fig. 1 Aten and Earth orbit
Fig. 2 IEO and Earth orbit
that lies between Earth and Mars without crossing Earth’s orbit and has a semimajor axis greater than 1 a.u. [1] as shown in Fig. 4. In the below figures, lime color orbit shows the Earth’s orbit and cyan color shows near-Earth orbit.
34 Predicting the Potentially Hazardous Asteroid …
361
Fig. 3 Apollo and Earth orbit
Fig. 4 Amor and Earth orbit
2 Literature Review Victor Basu developed a gradient boosting regressor, XGBoost regressor, random forest regressor, AdaBoost regressor, and multilayer perceptron regressor to predict the diameter of the asteroids. In order to train the algorithms, the author used a dataset which is officially maintained by NASA Jet Propulsion Laboratory, and the dataset consists of 786,226 rows and 22 columns. In order to find out the performance of the regression models, author used mean absolute error, mean squared error, median
362
K. Upender et al.
absolute error and explained variance score and r2score. After experimentation, the author concluded that multilayer perceptron algorithm is the best algorithm to tackle these types of problems [2]. Anish Si trained various machine learning classification models in order to classify the asteroids as hazardous and non-hazardous. For training the machine learning models, the author collected the data from kaggle which consists of 4688 rows and 40 columns. Author selected 15 features from the dataset and trained eight machine learning models. After training, it was concluded that ‘random forest with tree number 15 is the most optimal model according to accuracy and training time’ [3]. Mako et al. developed a perceptron-type neural network to classify the nearEarth asteroids (NEAs) into Apoheles, Aten, Apollo, Amor, and others. In their study, they proved that ‘in the classification of the NEAs is more convenient to use parameters like semi-major axis and the focal distance instead of semi-minor axis and eccentricity’. They concluded that the parameters like semi-major axis and the focal distance, the separatrices between the presented classical groups are all linear [4–9].
3 Methodology In traditional programming, the users give the input to a developed program and the program returns an output based on the developed logic. This kind of approach is time consuming in real-world situations. In order to get rid of this problem, machine learning is adopted. Machine learning is a study in which the warehouse data is given to a machine learning algorithm and in turn the algorithm will generate a model which establishes a relationship on how the input features are related to output feature. With the help of a generated model, we can predict the required result. Machine learning is divided into supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, data is precategorized which means every input feature sample in the dataset will have a correct output label and the algorithm will learn from the dataset by mapping a function from the independent features to the dependent feature. In this work, supervised learning is used to achieve the required result.
3.1 Dataset The training and testing datasets were created by taking the data from the NASA Jet Propulsion Laboratory Small-Body Database Search Engine on April 27, 2021. All samples are stored in the form of a csv file, and the dataset consists of 25,708 rows × 9 columns. The columns of the dataset have the information related to object fields, orbital and model parameters of the asteroids as shown in Table 1.
34 Predicting the Potentially Hazardous Asteroid …
363
Table 1 Data description Column name
Parameters
Column description
full name
Objects field
Object full name/designation
H
Brightness
pha
Potentially hazardous asteroids (PHA) flag (Y/N)
e
Orbital and model parameters
Eccentricity
a
Semi-major axis (a.u.)
q
Perihelion distance (a.u.)
i
Inclination; angle with respect to x–y ecliptic plane
moid
Earth minimum orbit intersection distance (a.u.)
class
Orbit classification
3.2 Data Analysis and Preprocessing The dataset has 9 variables and 25,708 observations, among the 9 variables 3 are categorical variables and 6 are numerical variables. In the dataset, there are 7 missing values in the feature ‘H’, 1 missing value in the feature ‘moid’, and 1 missing value in the feature ‘pha’. By analyzing the dataset, it is found that the feature ‘pha’ is the target variable because this feature gives the information that a particular asteroid is potentially hazardous asteroids (Y ) or non-hazardous asteroids (N). From Fig. 5, it is observed that the major distribution of asteroids lies in the brightness range of 20.712–25.3 and eccentricity range of 0.3105–0.568. Feature ‘H’, ‘e’, and ‘q’ are left-skewed whereas feature ‘i’, ‘moid’, and ‘a’ are right-skewed. In Fig. 6, bar plots of the features ‘pha’ and ‘class’ are plotted to find out the features’ word frequency in the dataset. It is clear that the majority of the asteroids in the dataset belong to the orbit APO and AMO. It is also understood that we have an imbalanced dataset because in the target class ‘pha’ the value (N) occurred over 10.88 times more than the value (Y ). In the data preprocessing phase, the rows corresponding to the missing values are deleted from the dataset because the missing values are 0.03% of the dataset which is quite negligible. The feature ‘full name’ is dropped as this feature cannot establish any relationship with the target variable because this feature talks about the object’s full name/designation. Categorical features are encoded into numerical features because machine learning algorithms can learn with the help of numerical data. After encoding, class ‘0’ and ‘1’ represent (N) and (Y ), respectively. The preprocessed dataset is divided into training and testing datasets with ratio of 7:3, respectively.
364
K. Upender et al.
Fig. 5 a Distribution of ‘H’ in the dataset, b distribution of ‘e’ in the dataset, c distribution of ‘a’ in the dataset, d distribution of ‘q’ in the dataset, e distribution of ‘i’ in the dataset, f distribution of ‘moid’ in the dataset
34 Predicting the Potentially Hazardous Asteroid …
365
Fig. 6 Distribution of categorical variables in the dataset
3.3 Experimentation For predicting the target variable, classification algorithms are used because the target variable has discrete values (Y ) and (N). In the training process, ensemble learning algorithms are used as they can achieve better performance even with the imbalanced dataset. To train the ensemble learning algorithms, decision trees are used because it breaks the data by making a decision based on series of yes or no. In the decision tree algorithm, all the independent features from the dataset form internal nodes to classify the dependent feature. During the training phase, the training dataset is passed to the default parameterized decision tree classifier and ensemble learning algorithms like random forest classifier, bagging classifier, gradient boosting classifier, XGBoost classifier, and CatBoost classifier algorithms for training the models. These algorithms’ main aim is to produce an optimal model with low bias and variance. Random forest classifier and bagging classifier focus on reducing the variance from the base model whereas gradient boosting classifier, XGBoost classifier, and CatBoost classifier focus on reducing the bias from the base model. After training the model, a robust and flexible model is selected for predicting the potentially hazardous asteroids to Earth. Selection of the best model was done by evaluating the recall, precision, F1 score, and accuracy of the models. Recall tells us out of all potentially hazardous asteroids how many are predicted as potential hazardous asteroids correctly. Precision tells us out of all the potentially hazardous asteroids predictions which are correctly classified as potentially hazardous asteroids. F1 score tells us the overall performance of a model as is a harmonic mean of recall and precision. Accuracy tells how many times the model correctly predicted the target class. Thereafter with the aim of reducing the false negatives, the selected robust model’s hyperparameters are tuned using the Grid Search CV technique.
366
K. Upender et al.
Fig. 7 Block diagram for predicting the potentially hazardous asteroids to Earth
4 Results and Discussions After the training phase, the default parameterized decision tree classifier, random forest classifier, bagging classifier, gradient boosting classifier, XGBoost classifier, and CatBoost classifier are tested with the testing data. In Table 2, recall, precision, F1 score, and accuracy of the default parameterized classifiers are recorded. From Table 2, it can be concluded that the XGBoost classifier performed better than other algorithms as it has a more F1 score and accuracy score when compared with other algorithms. Table 2 Experimentation results with default parameterized algorithms Experimented classification algorithms
Recall Class 0
Class 1
Class 0
Precision Class 1
Class 0
F1 score Class 1
Accuracy (%)
XGBoost classifier
1.00
0.97
1.00
0.98
1.00
0.98
99.61
Bagging classifier
1.00
0.97
1.00
0.98
1.00
0.97
99.57
Gradient boosting
1.00
0.97
1.00
0.97
1.00
0.97
99.53
Random forest
1.00
0.98
1.00
0.97
1.00
0.97
99.53
CatBoost
1.00
0.96
1.00
0.98
1.00
0.97
99.46
Decision tree
1.00
0.96
1.00
0.97
1.00
0.97
99.42
34 Predicting the Potentially Hazardous Asteroid …
367
Table 3 Hyperparameters of the tuned XGBoost model Hyperparameters learning_rate max_depth min_child_weight gamma colsample_bytree Tuned values
0.22
6
7
0.0015
0.5
Fig. 8 Confusion matrix of the developed model
Table 4 Metrics of the tuned XGBoost model Recall
Precision
F1 score
Accuracy (%)
Class 0
Class 1
Class 0
Class 1
Class 0
Class 1
1.00
0.98
1.00
0.98
1.00
0.98
99.65
After hyperparameter tuning using Grid Search CV, the best parameters are obtained and they are mentioned in Table 3. To measure the performance of the tuned XGBoost classifier, a confusion matrix is visualized in Fig. 8. From this confusion matrix, few metrics like recall, precision, F1 score, and accuracy are obtained (Table 4). In Fig. 9, a graph is plotted for true positive rate vs. false positive rate whose values go from 0 to 1, and this graph shows the performance of the developed model at all classification thresholds. The area under the XGBoost classifier curve is 0.99 which means that the model has a 99% chance to distinguish between potentially hazardous asteroids and non-hazardous asteroids. The developed XGBoost model is a tree-based classifier, and it is very complex to analyze the tree as it is nonlinear in nature. To have a good intuition of the developed model, a simplified analysis in Fig. 10 is plotted with the help of SHAP representation. In Fig. 10, the combined effect of all the features is plotted. In the summary plot, the central margin is a partition line, which is showing a zero label on the x-axis. On the y-axis, all the features are plotted in descending order, i.e., important features are on the top. The color code of the feature represents a low or high value of the feature. If a feature is pointing toward the right of the partition line, then it can be concluded that the feature leads to risk with the corresponding feature value. From Fig. 10, it can be noted that with the lower values of feature ‘H’ and ‘moid’, there is a higher chance of being a potentially hazardous asteroids to Earth.
368
K. Upender et al.
Fig. 9 ROC curve of the developed model
Fig. 10 Summary plot of the developed model
With the high value of feature ‘e’, there is a chance of being potentially hazardous asteroids to Earth.
5 Conclusion The main objective of this work is achieved after several experiments on machine learning and ensemble learning algorithms. XGBoost classifier performed better than other algorithms with the default parameters. After hyperparameter tuning, the developed XGBoost classifier gave an F1 score of 1.00 and 0.98 on non-hazardous asteroids and potentially hazardous asteroids, respectively. The model achieved an accuracy of 99.65% on the test dataset, and the model was able to distinguish between potentially hazardous asteroids and non-hazardous asteroids with a 99% predicting ability. In
34 Predicting the Potentially Hazardous Asteroid …
369
later stages, this can be expanded to predict the approximate position of the asteroid in its orbit at a specific period of time in the future. This prediction of position helps to know the possibility of an asteroid approaching very close to the Earth. In Celestial mechanics, the solar system becomes an ‘n body’ problem. By attempting to design numerical algorithms considering the masses, initial velocities, and positions of asteroids and planets, the numerical solutions can be formulated which gives the subsequent motion of these asteroids, thus predicting their approximate positions.
References 1. L. Mcfadden, R. Binzel, Chapter 14: Near-earth objects. AAS/Division for extreme solar systems abstracts (2007), pp. 283–300. https://doi.org/10.1016/B978-012088589-3/50018-9 2. V. Basu, Prediction of asteroid diameter with the help of multi-layer perceptron regressor (2019). Int. J. Adv. Electron. Comput. Sci., ISSN(p): 2394-2835 3. S. Anish, Hazardous asteroids classification through various machine learning techniques (2020). Int. Res. J. Eng. Technol. 7(3) 4. Z. Mako, F. Szenkovits, G.-M. Edit-Maria, I. Szücs-Csillik, Classification of near earth asteroids with artificial neural network. Studia Univ. Babes-Bolyai. 50, 85–92 (2005) 5. J. Hefele, F. Bortolussi, S. Zwart, Identifying earth-impacting asteroids using an artificial neural network. Astron. Astrophys. 634 (2020).https://doi.org/10.1051/0004-6361/201935983 6. v. Pasko, Prediction of orbital parameters for undiscovered potentially hazardous asteroids using machine learning (2018). https://doi.org/10.1007/978-3-319-69956-1_3 7. R. Binzel, D. Lupishko, M. Di Martino, R.J. Whitheley, G.J. Hahn, Physical properties of near-earth objects (2002) 8. H. Shang, X. Wu, D. Qiao, H. Xiangyu (2018) Parameter estimation for optimal asteroid transfer trajectories using supervised machine learning. Aerosp. Sci. Technol. 79. https://doi.org/10. 1016/j.ast.2018.06.002 9. B. Jia, K.D. Pham, E. Blasch, Z. Wang, D. Shen, G. Chen, Space object classification using deep neural networks (2018), pp. 1–8. https://doi.org/10.1109/AERO.2018.8396567
Chapter 35
Generating Automatic Ground Truth by Integrating Various Saliency Techniques Ramesh Cheripelli
and A. N. K. Prasannanjaneyulu
1 Introduction The human vision system does not visit all the objects/region in a scene with the same focus. Our focus goes to only prominent or salient object. Therefore, in the images, salient objects will be identified easily with help of saliency map [1], it’s one the best technique. In few images, it is not at all possible to prepare the ground truth; in such cases, based on the saliency models [1, 2], the results will very much vary with different states. To overcome these issues and to attain the composite saliency map, in this paper, we combined few strategies and added some extra steps. With this, we easily integrated each and every single property of map element and which can produce the saliency map which will be more near to ground truth. The human brain intelligently filters important visual information, which may reaches brains in a deeper level. The computer vision system should be motivated through human vision system for attending a scene in the similar way.
1.1 Architecture of Saliency In any architecture of saliency model, first from the given image we will extract features which are in low level like the color, intensity and orientations [3]. Based on these features, individual feature map for color, intensity and orientations are generated. After linearly combing of all these maps, a final map is generated, which is known as saliency map. A neural network known as winner-take-all (WTA) chooses a point on the map which is most salient and evades the focal point of consideration
R. Cheripelli (B) Department of IT, G. Narayanamma Institute of Technology and Science, Hyderabad, India A. N. K. Prasannanjaneyulu Institute of Insurance and Risk Management, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_35
371
372
R. Cheripelli and A. N. K. Prasannanjaneyulu
Fig. 1 Architecture of saliency models
through the mechanism of inhibition of return (IOR). It will be diverted promptly to a formerly went area, along these lines, permitting the choice of the next most salient (Fig. 1). Requirements: we have established the following prerequisites to saliency detector: • Build up comprehensive saliency map. • Obtain the master map, it will wrap complete salient region obtained through different state of the art. • Consistently explore all leading regions. • Build accurate borders for the salient objects. • Using reverse engineering different ground truth data preparation. • Using reusability reduce cost and Time.
1.2 Ground Truth Data Generation Various models are proposed for characterizing salient locations in a digital image. It is necessary to benchmark these models on a reference dataset. This process involves two major steps, namely the generation of ground truth salient locations of an image and analyzing performance of a model by comparing its response against the ground truth. In several experiments, eye trackers are used for obtaining salient points of an image. However, this technology is costly and is yet to be perfected. It is based on hand–eye coordination of a person. It does not require any specialized instruments other than the usual computing resources. Compilation of collected data from individual volunteer is carried out using image segmentation. It is followed by analysis of each segment to determine its participation in the ground truth. MSRA10K: Pixel precise striking item naming for 10,000 pictures from MSRA dataset. The MSRA Salient Object Database initially gives striking item explanation regarding bouncing boxes. It is generally utilized in striking article discovery and division local area (Fig. 2).
35 Generating Automatic Ground Truth by Integrating Various …
373
Fig. 2 Architecture of saliency models
1.3 Scope of Work The scope of this work is to enhance the vision capability of computer system based on human visual system. Primary thought of this proposition is to produce a generic saliency map of a scene by combining the different condition of state-of-the-art methodology. 1. Context-aware saliency [4] 2. Graph-Based Visual Saliency [5] 3. Itti-Koch Saliency Model [2]. Bringing all these state-of-the-art techniques altogether under the same umbrella will give new dimension to the computer vision system while attending a scene. We can reduce time and cost due to inclusion of reusability. Automatic generation of ground truth.
2 Related Work Existing computational saliency models are briefly presented in this section. They include use of features, fusion of saliency computed across various features and
374
R. Cheripelli and A. N. K. Prasannanjaneyulu
evaluation of saliency models. Brief description about edge detection and image segmentation (EDISON) tool [6], which we are going to use for creation of segmented input. Brief discretion about mean shift segmentation algorithm, which is internally used by edge detection and image segmentation (EDISON) tool to generated segmented image.
2.1 Context-Aware Saliency This algorithm considers the context which convey the background parts and the objects which are most important. Saliency is significant for comprehension of human consideration just as for explicit applications like auto centering. Saliency is helpful for a few undeniable level undertakings, for example, recognition of object or the segmentation. The algorithm supports four psychological human visual consideration principals, such as visual association rules, global contemplations, low-level and high-level contemplations. Detection of Context-Aware Saliency As per the principles of local low-level contemplations, high saliency will be achieved in areas which have unique patterns or colors. Alternately, low saliency will be achieved in areas which are homogeneous or blurred. In concurrence with principles, global contemplations, oftentimes happening characteristics ought to be smothered. As per standard visual organization leads, the notable pixels ought to be assembled, and not spread everywhere on the picture. We characterize single-scale nearby worldwide saliency dependent on standards. Single-Scale Saliency Local–Global Basically to characterize the saliency, two difficulties are existing. First is the manner by which to characterize peculiarity both locally and globally. The second is the manner by which to consolidate positional data. As per principles of local low-level and global contemplations, a pixel is notable if its appearance is different. We ought not glance at a segregated pixel, but instead at its encompassing patch, which gives a prompt setting. For instance, we think about a solitary fix of scale r at every pixel. In this manner, a pixel i is viewed as salient if the presence of the fix pi centered at pixel i is particular concerning any remaining picture patches. In particular, let dcolor ( pi , p j ) be the Euclidean distance between the vectorized patches pi and p j in CIE L*a*b tone [7] space, standardized to the reach [0, 1]. Pixel i is viewed as striking when dcolor ( pi , p j ) is high. As per principle of visual organization rules [4, 7], the positional distance between patches is additionally a significant factor. Foundation patches are probably going to have numerous comparable patches all over away in the picture. This is as opposed to striking patches which will in general be assembled together. This suggests that a fix pi is remarkable when the patches like it are close by, and it is less notable when the looking-like patches are far away. Let dposition ( pi , p j )( pi, pj) be the Euclidean distance between the places of patches pi and p j , standardized by the bigger picture measurement. In view of the perceptions above, we characterize a difference measure dcolor ( Pi ,P j ) where c = 3 in our between a couple of patches as d Pi , P j = 1+C.d position ( Pi ,P j )
35 Generating Automatic Ground Truth by Integrating Various …
375
execution. This disparity measure is corresponding to the distinction for all intents and purposes and opposite relative to the positional distance. Pixel I is viewed as striking when it is profoundly unlike any remaining picture patches, i.e., when d( pi , p j ) is high. The single-scale saliency worth of pixel i at scale r is characterized as left. q k k = 1−64 in our trials. It suffices to think about the K most comparable patches Sir = K r r 1 − exp − 1K K =1 d( pi , qk ).
3 Graph-Based Visual Saliency (GBVS) It is a two-step process: In the first step, it considers channels with that creates activation maps; after that, conspicuity will be highlighted by normalizing the channels and then other maps will be combined. It builds a weighted directed complete graph from each feature (color, intensity and orientation). Every pixel of the input image is treated as basic location or node (vertex). The edge weight among the two nodes is relative to their feature variation and positional proximity. This state transition probability of the Markov chain models the gaze will change from one pixel to another. Saliency map is generated by equilibrium distribution of the Markov chain. The calculation of equilibrium distribution performed as initially it creates uniform vector and with that, Markov matrix is multiplied frequently. It results as matrix of Eigen vector principal (Fig. 3). GBVS model organized in to three phases: Activation: With the help of feature vectors create an activation map. Extraction: On image plane, feature vectors will be extracted. Combination/Normalization: Activation map(s) will be normalized by the combination of maps into a single map.
3.1 Itti-Koch Saliency Model According to Koch and Ullman, based on theory of feature-integration and mechanism of bootup developed a model through which we can identify the portions in
Fig. 3 Saliency model
376
R. Cheripelli and A. N. K. Prasannanjaneyulu
Fig. 4 Architecture of Itti-Koch saliency model
a scene which requires the attention and with this model, parallelly, we can easily extract few features. With these features, we can produce conspicuity maps form the image or scene which are hive off and their locations from their surroundings are significantly differ. With the locations relative saliency, maps are combined and produced map encoding for single saliency. There are, in saliency map, several salient locations available. In the saliency map, choose the salient locations which are high using the mechanism of winner-take-all (WTA). In the image, first focused on which are most salient, and after that, most salient location is choosen using the mechanism of WTA. This framework is backbone for various other models. This model has been made based on mapping the selected location into the central representation, proximity preference and similarity preference (Fig. 4).
35 Generating Automatic Ground Truth by Integrating Various …
377
4 Proposed Method Accumulating all features in a single saliency map, we come up with a solution by merging the different features of some of the state-of-the-art methods. Flow chart of proposed solution, where we are passing output of one saliency model as input to other saliency model, generates the feature map that has been depicted in Fig. 5. Flow chart of Superimposed Proposed Method has been depicted in Fig. 6. In this experiment, various ordered combinations of three different state-of-the-art saliency models have been attempted. The different possible combinations of any two methods at a time are:
Fig. 5 Proposed method
378
R. Cheripelli and A. N. K. Prasannanjaneyulu
Fig. 6 Superimposed method
1. Context-Aware followed by Graph-Based Saliency and Vice Versa 2. Context-Aware followed by Itti-Koch Saliency and Vice Versa 3. Graph-Based Saliency followed by Itti-Koch and Vice Versa.
4.1 Context-Aware Followed by Graph-Based Saliency Model and Vice Versa We have first generated the feature map using context-aware saliency model [2]. Then passed the generated feature map from context-aware saliency model as input to graph-based saliency model [5] to generate final feature map that will validate against ground truth image of input image (Fig. 7). In the vice versa part, we have first applied the graph-based saliency model for generating the feature map. Then, on this obtained map, context-aware saliency
35 Generating Automatic Ground Truth by Integrating Various …
379
Fig. 7 Feature map using method of context-aware followed by graph-based
Fig. 8 Feature map using method of graph-based followed by context-aware
model has been applied. The entire procedures are similar to the first part only. The obtained result after applying the graph-based saliency model followed by contextaware saliency model is depicted in Fig. 8.
4.2 Context-Aware Followed by Itti-Koch Saliency and Vice Versa Similarly, as like previous section, we have tried to generate the features using context-aware followed by Itti-Koch saliency model and vice versa in Figs. 9 and 10 respectively.
380
R. Cheripelli and A. N. K. Prasannanjaneyulu
Fig. 9 Feature map using method of context-aware followed by Itti-Koch
Fig. 10 Feature map using method of Itti-Koch followed by context-aware
4.3 Graph-Based Saliency Followed by Itti-Koch and Vice Versa Similarly, as like previous sections, we have tried to generate the features using graph-based [5] followed by Itti-Koch [2] saliency model and vice versa in Figs. 11 and 12, respectively.
4.4 Generating the Resultant Saliency Map and Validate with Ground Truth For all the corresponding feature map obtained through context-aware and graphbased model are need to be of same size so that pixel-by-pixel mapping can be possible in the further step. For accumulating the overall features, feature maps obtained
35 Generating Automatic Ground Truth by Integrating Various …
381
Fig. 11 Feature map using method of graph-based followed by Itti-Koch
Fig. 12 Feature map using method of Itti-Koch followed by graph-based
Fig. 13 Left: input, First middle: ground truth, Second middle: Itti-Koch saliency map, Right: context-aware saliency map
through methods of context-aware [2] and graph-based model [5] are superimposed. In the implementing section A, the obtained output is not up to the mark due to loss of features. Therefore, in order to persist the features obtained through used state-of-the-art methods, we have superimposed all these methods. The results after superimposing the techniques have been shown in Figs. 13, 14, 15 and 16.
382
R. Cheripelli and A. N. K. Prasannanjaneyulu
Fig. 14 Left: GBVS map, Middle: superimposing Itti-Koch with context-aware, Right: superimposing Itti-Koch with context-aware and GBVS
Fig. 15 Left: Input image, First middle: ground truth, Second middle: Itti-Koch saliency map, Right: context-aware saliency map
Fig. 16 Left: GBVS map, Middle: superimposing Itti-Koch with context-aware, Right: superimposing Itti-Koch with context-aware and GBVS
5 Conclusion In this work, various integration strategies over some state-of-the-art saliency techniques have been applied for generating the automatic ground truth. In the first part, different possible combinations of any two methods (in the fashion of one followed by another) have been attempted. But resultant output was not convincing with the real ground truth. Therefore, in the next part of this work, all the state-of-the-art saliency techniques are super imposed. Then, the final output is validated with the real ground truth image and is found to be well convincing with the original one. While generating the segmented images from the input images, it is hard to set the exact parameters values like spatial and color range as well as the minimum region. Because, these values may be different with the different images for getting the appropriate segmentation. It means that for each and every image, it gets changed
35 Generating Automatic Ground Truth by Integrating Various …
383
while creating the segmented input. So it was lots of manual effort involved. In case of complex images, still, there is a requirement for improving of the feature maps generation techniques so that proposed saliency map can reach more near to ground truth image.
References 1. R. Pal, R. Srivastava, S.K. Singh, K.K. Shukla, Computational models of visual attention: a survey, in Recent Advances in Computer Vision and Image Processing: Methodologies and Applications (2013), pp. 54–76 2. L. Itti, C. Koch, Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001) 3. H. Li, J. Chen, H. Lu, Z. Chi, CNN for saliency detection with low-level feature integration. Neurocomputing 226, 212–220 (2017) 4. S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1915–1926 (2012) 5. J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in Proceedings of Annual Conference on Neural Information Processing Systems (NIPS) (2006), pp. 545–552 6. http://coewww.rutgers.edu/riul/research/code/EDISON/ 7. K.N. Plataniotis, A.N. Venetsanopoulos, Color Image Processing and Applications (Springer Science & Business Media, 2013) 8. S. Frintrop, E. Rome, H.I. Christensen, Computational visual attention systems and their cognitive foundations: a survey. ACM Trans. Appl. Percept. (TAP) 7(1), 6 (2010) 9. C. Koch, S. Ullman, Shifts in selective visual attention: towards the underlying neural circuitry, in Matters of Intelligence (Springer Netherlands, 1987), pp. 115–141 10. http://mmcheng.net/msra10k/ 11. C. Ramesh, K. Venugopal Rao, D. Vasumathi. Evaluation of key management scheme based on identity, in 6th IEEE International Advanced Computing Conference (IACC 2016), pp. 27–28 12. C. Koch, S. Ullman, Selecting One Among the Many: A Simple Network Implementing Shifts in Selective Visual Attention (No. AI-M-770). Massachusetts Institute of Tech Cambridge Artificial Intelligence Lab (1984) 13. C. Ramesh, K. Venugopal Rao, D. Vasumathi. Identity-based crypto system based on tate pairing. Glob. J. Comput. Sci. Technol. (2016) 14. D. Comanicu, P. Meer: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24, 603–619 (2002). C. Christoudias, B. Georgescu, P. Meer, Synergism in low-level vision, in 16th International Conference on Pattern Recognition, vol. IV (Quebec City, Canada, 2002), pp. 150–155 15. C. Ramesh, K. Venugopal Rao, D. Vasumathi, Comparative analysis of applications of identitybased cryptosystem in IOT. Electron. Gov. Int. 13, 314–323 (2017) (ISSNonline:1740-7508, ISSNprint:1740-) 16. C. Ma, Z. Miao, X.P. Zhang, M. Li, A saliency prior context model for real-time object tracking. IEEE Trans. Multimed. 19, 2415–2424 (2017) 17. H. Lee, D. Kim, Salient region-based online object tracking, in Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (Lake Tahoe, NV, USA, 2018), pp. 1170–1177 18. K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, Y. Bengio, Show, attend and tell: neural image caption generation with visual attention, in Proceedings of the 32nd International Conference on Machine Learning (Lille, France, 2015), pp. 2048–2057 19. C. Qin, G. Zhang, Y. Zhou, W. Tao, Z. Cao, Integration of the saliency-based seed extraction and random walks for image segmentation. Neurocomputing 129, 378–391 (2014)
384
R. Cheripelli and A. N. K. Prasannanjaneyulu
20. H. Fu, D. Xu, S. Lin, Object-based multiple foreground segmentation in RGBD video. IEEE Trans. Image Process. 26, 1418–1427 (2017) 21. M. Donoser, M. Urschler, M. Hirzer, H. Bischof, Saliency driven total variation segmentation, in Proceedings of the 2009 IEEE 12th International Conference on Computer Vision (Kyoto, Japan, 2009), pp. 817–824 22. A. Borji, M.M. Cheng, Q. Hou, H. Jiang, J. Li, Salient object detection: a survey. Comput. Vis. Media 5, 117–150 (2019) 23. L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998) 24. X. Hou, L. Zhang, Saliency detection: a spectral residual approach, in Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (Minneapolis, MN, USA, 2007), pp. 1–8 25. J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Boston, MA, USA, 2015), pp. 3431–3440 26. L. Wang, L. Wang, H. Lu, P. Zhang, X. Ruan, Salient object detection with recurrent fully convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1734–1746 (2018) 27. S. Chen, X. Tan, B. Wang, H. Lu, X. Hu, Y. Fu, Reverse attention-based residual network for salient object detection. IEEE Trans. Image Process. 29, 3763–3776 (2020) 28. J. Zhang, T. Zhang, Y. Dai, M. Harandi, R. Hartley, Deep unsupervised saliency detection: a multiple noisy labeling perspective, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Salt Lake City, UT, USA, 2018), pp. 9029–9038 29. W. Wang, J. Shen, Y. Yu, K.L. Ma, Stereoscopic thumbnail creation via efficient stereo saliency detection. IEEE Trans. Vis. Comput. Graph. 23, 2014–2027 (2016) 30. S. Goferman, A. Tal, L. Zelnik-Manor, Puzzle-like collage, in Computer Graphics Forum; vol. 29 (Wiley Online Library, Oxford, UK, 2010), pp. 459–468
Chapter 36
GATEZEE—An Automated Gate Pass Management System Vaddeboyina Sri Manvith, Shiva Madhunala, and B. V. Kiranmayee
1 Introduction Gate pass management has become a prime activity with the spike in the number of school/college enrolments. In the traditional method of requesting an out pass, one needs to manually fill all the details and wait for the higher authorities to accept the request and grant permission. This method becomes onerous and time consuming when there is a surge in the number of requests, and every gate pass request needs to be verified manually and granted permission and may also lead to errors. Hence, an automated system is essential for effective management of in and out-of-gaterelated activities. The gate pass management system is a software application that contains the details of all the students and employees in an organization. This system adopts a paperless out pass generation in contrast to the traditional paper-based out pass approach. This not only reduces the usage of paper but also helps expedite the process of requesting an out pass right from the browser. This application helps in tracking the students and employees at any point in time in an organization. This application follows a systematic flow from student to faculty/HOD and finally results in the generation of the electronic version of the gate pass. The current applications consist of limited dashboards, allow only one admin to monitor the requests, and there is no standard authentication mechanism employed. Hence, there is a delay in granting the permission and effective gate pass management. Hence, our work involves building a structured and user-friendly application with interactive dashboards and appropriate visualizations, efficient data flow between students and their Vaddeboyina Sri Manvith (B) · S. Madhunala · B. V. Kiranmayee Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, Hyderabad, Telangana, India e-mail: [email protected] B. V. Kiranmayee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_36
385
386
Vaddeboyina Sri Manvith et al.
assigned faculty (mentors), a prompt notification mechanism using e-mail communication, using text processing and sentiment analysis techniques, and employing an effective authentication mechanism at the security gate. This system could overcome the problems faced by students and faculty in requesting and granting the gate pass.
2 Related Work Archana et al. [1] have proposed a simple paperless approach for visitor gate pass management system in 2019, which reduces the manual effort and facilitates accepting/rejecting the requests through a single host. As there is a single host, it becomes cumbersome when there are more number of requests to be verified and granted permission. They have not mentioned the details regarding the authentication technique. Kaushik et al. [2] have worked on building an automated leave management system through which a student can upload a supporting document and wait for approval. This system mainly focuses on the attendance management of a student. They have not mentioned the details regarding the faculty leave management and the authentication mechanism. Harish et al. developed an application titled— ‘Visitor Gate pass Management System’ [3] which helps in efficient maintenance of visitors in an organization. They have also mentioned the details of visitor classification, broad classification of visitors, and their types of visit. Lengure et al. [4] have worked on building an application using model–view–controller (MVC) pattern. They have included login, clerk, admin, and security guard modules. They have not discussed anything regarding authentication mechanisms like QR code, RFID, or OTP. Venkatesa Perumal et al. [5] proposed an online gate pass application form for hostel students for one day/long leave. This system has a flow from student–warden–HOD–hostel manager–security guard. This system provides notifications through e-mails. This system has limited dashboards. Gowtham et al. [6] have worked on automation of visitor gate pass management system. This system follows SDLC approach and involves authentication of visitors through visitor data entered by employee and photographs. This also involves e-mail notifications, bar codes, and departmentwise summary report generation.
2.1 Existing System The existing system provides a paperless approach for gate pass management system or visitor management system or leave management system. These applications involve flow between: student–faculty–security (or) faculty–HOD–security (or) visitor–employee–security (or) student–warden–HOD–security. These applications contain limited dashboards, use a single admin (HOD/principal/faculty), and do not use any standard authentication mechanism like OTPs or RFID or QR code.
36 GATEZEE—An Automated Gate Pass Management System
387
3 Proposed Framework The proposed framework. • Provides a proper and structured flow: Student–mentor/HOD–security. • Is more secure and more reliable with specific set of services and functionalities to five users: 1. student 2. admin 3. faculty/mentor 4. HOD 5. security • Employs standard authentication mechanism using QR codes which can be validated by the security guard easily. This is generated using the in-built Node.js ‘qrcode’ package. • Facilitates the process of sending requests from students (mentees) to their respective faculty (mentors) and requests from faculty to their respective HOD obviating the use of a single admin. • Provides positive, negative, and neutral scores along with compound score based on the reason mentioned in the request form using VADER sentiment analysis algorithm and prioritizes the request based on these scores. • Provides an integrated phone call communication mechanism with parent/guardian to verify the genuineness of the student’s request and avoid fake or intentional requests. • Provides a wide range of interactive dashboards which are user-friendly, way more detailed and easy to understand. • Provides e-mail notification mechanism. • Provides a mechanism for confirmation from parent through integrated phone call mechanism for phone calls. Flowchart of our proposed system is depicted in Fig. 1. Figure 2 shows the login screen, and Fig. 3 shows the interactive student dashboard. Figure 4 shows all the out pass the student/faculty has requested earlier, and Fig. 5 shows the student profile with edit details functionality. Figure 6 shows the out pass request form, and Fig. 7 shows the out pass request screen as viewed by faculty/mentor/HOD.
Fig. 1 Flowchart of our proposed system
388
Fig. 2 Login screen
Fig. 3 Student dashboard Fig. 4 View all out passes
Vaddeboyina Sri Manvith et al.
36 GATEZEE—An Automated Gate Pass Management System
Fig. 5 Student profile
Fig. 6 Out pass request form
Fig. 7 Out pass request dialog screen
389
390
Vaddeboyina Sri Manvith et al.
Fig. 8 Mentees recent out passes
Fig. 9 Security dashboard
Figure 8 shows the recent out passes of mentees. Figure 9 shows the security dashboard, and Fig. 10 shows a sample QR code.
3.1 Modules The proposed framework consists of four main modules. Each module has some specific set of functionalities as mentioned below. (continued)
36 GATEZEE—An Automated Gate Pass Management System
391
(continued) Student module • Edit details • View previous out pass • Request new out pass • Update out pass • View mentor details and HOD details
Faculty module • View details • Edit details • View previous out passes • Request new out pass • Update out pass • View HOD details
Security module • View details • Edit details • Authenticate out pass
Common/global module • Forgot password • Reset password • QR code generator • Update out pass
3.2 Sentiment Analysis Sentiment analysis refers to the act of computing and classifying a given sentence into negative, neutral, or positive.
3.3 VADER Sentiment Analysis ‘VADER’ stands for ‘Valence Aware Dictionary and sEntiment Reasoner’. It is a rule-based and lexicon tool of sentiment analysis that is accustomed to the social media sentiments [7–14]. It also determines how negative or positive a given sentiment is. It calculates ‘compound score’, which is a ‘metric that calculates the sum of all the lexicon ratings which have been normalized between “−1” (most extreme negative) and “+1” (most extreme positive)’. The mathematical notation of normalization used is shown in Eq. 1. The score range of each type of sentiment is listed in Table 1. x=√
x x2
+α
(1)
where x: sum of valence scores of constituent words and α = normalization constant (default value is 15).
392
Vaddeboyina Sri Manvith et al.
Fig. 10 QR code
Table 1 VADER sentiment analysis
‘Positive sentiment’
‘Compound score ≥ 0.05’
‘Negative sentiment’
‘Compound score ≤ −0.05’
‘Neutral sentiment’
‘Compound score > −0.05 and compound score < 0.05’
4 Testing and Results We have tested our Web framework on the following test cases: database testing, functional testing, compatibility testing, performance testing, usability testing, and security testing all passed all the test cases. The results of the Web framework are shown in Figs. 11, 12, and 13. Fig. 11 Out pass request form
36 GATEZEE—An Automated Gate Pass Management System
393
Fig. 12 Out pass request dialog screen
Fig. 13 Framework results
4.1 VADER Sentiment Analysis We have tested the VADER sentiment algorithm on some sentences. The results of the various scores obtained are tabulated in Table 2.
5 Conclusion Our project simplifies the manual task of getting permissions for candidates from an organization to take leave/holiday. From the student’s/user’s perspective, It will be easier to put a request for applying leave and receive gate pass or permission instead of going to the respective department and acquiring manual written slips. From the management’s perspective, it will be easier to find student’s details, reason for the application and whether to give permission. Usage of QR codes will be more secure in allowing students out of the organization at the gate. GATEZEE facilitates a smooth end-to-end flow for the above requirements.
394
Vaddeboyina Sri Manvith et al.
Table 2 VADER sentiment algorithm scores S. No.
Sentence
1
‘Positive score’
‘Negative score’
‘Neutral score’
‘Compound score’
My friend met 0.311 with an accident
0.301
0.388
0.0258
2
I have to attend my sister’s wedding
0
1
0
3
Suraj’s 0 grandfather is no more
0.355
0.645
−0.296
4
I want to go out for a casual outing
0
0.617
0.2732
5
I need to go for a 0 health check-up
0
1
0
6
Suraj’s 0 grandfather died
0.643
0.357
−0.5574
7
Raju’s mother is 0 in hospital
0
1
0
8
I am not feeling well
0
0.614
0.386
−0.2924
9
I have a doctor appointment
0
0
1
0
10
Farhan is ill. He needs me to accompany him to the hospital
0
0.203
0.797
−0.4215
0
0.383
6 Future Scope Our framework has inspired potential future work including using advanced sentiment analysis methods like lexicon and machine learning approaches, along with natural language processing (NLP) methods for better text sentiment score. Acknowledgements We would like to thank the Department of Computer Science Engineering, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad for helping us carry out the work and supporting us all the time.
References 1. S.B. Archana, N. Buwa, V. Jadhav, P. Ganate, S. Dagale,Gate-pass management system. Int. J. Sci. Eng. Dev. Res. (www.ijsdr.org), 4(2), 246–248 (2019), ISSN: 2455-2631. Available:
36 GATEZEE—An Automated Gate Pass Management System
395
http://www.ijsdr.org/papers/IJSDR1902040.pdf 2. V.K. Kaushik, A. Gupta, A. Kumar, A. Prasad, Student leave management system. Int. J. Adv. Res. Inno. Ideas Educ. 3(2017), 124–131 (2017) 3. H. Rapartiwar, P. Shivratri, O. Sonakul, A. Bhugul, Visitor gate pass management system. Int. J. Comput. Sci. Mobile Comput. 6(2) 4. C. Lengure, L. Kakde, M. Bargat, S. Jambhulkar, A. Palandurkar, H. Wade, E-gatepass system. Int. Res. J. Eng. Technol. 5(3) 5. S. Venkatesa Perumal, B.I. Juvanna, S. Rajan, Online gate pass application form for hostel students. Int. J. Pure Appl. Math. 119(18), 1657–1664 (2018) 6. I. Gowtham, T. Sathishkumar, S. Lakshmiprasad, P. Arumugam, G. Prabhakara Rao, Automation of visitor gate pass management system, in 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT) 7. C. Hutto, E. Gilbert, VADER: a parsimonious rule-based model for sentiment analysis of social media text, in Proceedings of the International AAAI Conference on Web and Social Media, 8(1) (2014). Retrieved from https://ojs.aaai.org/index.php/ICWSM/article/view/14550 8. A. Borg, M. Boldt, Using VADER sentiment and SVM for predicting customer response sentiment. Expert Syst. Appl. 162, 113746 (2020).https://doi.org/10.1016/j.eswa.2020.113746 9. v. Bonta, N.K.N. Janardhan, A comprehensive study on lexicon based approaches for sentiment analysis, 1–6 (2019) 10. D. Bose, P.S. Aithal, S. Roy, Survey of Twitter viewpoint on application of drugs by VADER sentiment analysis among distinct countries (March 15, 2021). Int. J. Manage. Technol. Soc. Sci. (IJMTS) 6(1), 110–127 (2021). ISSN: 2581-6012, Available at SSRN: https://ssrn.com/ abstract=3805424 11. https://www.geeksforgeeks.org/python-sentiment-analysis-using-vader/ 12. https://dev.to/sheikh_ishaan/sentiment-analysis-using-node-js-4kfb 13. https://www.npmjs.com/package/vader-sentiment 14. https://github.com/cjhutto/vaderSentiment
Chapter 37
Obesity Prediction Based on Daily Lifestyle Habits and Other Factors Using Different Machine Learning Algorithms Chalumuru Suresh, B. V. Kiranmayee, Milar Jahnavi, Roshan Pampari, Sai Raghu Ambadipudi, and Sai Srinivasa Preetham Hemadri
1 Introduction Obesity is regarded as a buildup of fat in the body that can cause serious medical issues. Obesity is not just a cosmetic concern. It is a health condition that increases the risk of developing several illnesses such as high blood pressure, cancer, heart disease, diabetes, and so on. Body mass index (BMI) is generally considered as the major factor to describe whether a person is suffering with obesity or not. According to the BMI value, there are 4 distinct weighing statuses. An individual is regarded as underweight if his or her BMI is beneath 18.5. A person’s BMI is normal if it falls in midway of 18.5 and 24.9 (both included). If a person’s BMI is in midway of 25.0 and 29.9 (both included), they are overweight, and if their BMI is 30.0 or higher, they are obese. The body mass index value of an individual is generally calculated by using height in meters and weight in kilograms. Obesity is not determined just using BMI value, but there are many other factors which are useful to determine obesity in a person. The factors can include unhealthy diet, inactivity, age, gender, lack of sleep, family inheritance, drinking/smoking habits, basal metabolic rate (BMR), resting metabolic rate (RMR), body fat percentage (BFP), protein recommended dietary allowances (RDA). In this study, machine.
C. Suresh · B. V. Kiranmayee · M. Jahnavi (B) · R. Pampari · S. R. Ambadipudi · S. S. P. Hemadri Department of CSE, VNR VJIET, Hyderabad, Telangana, India C. Suresh e-mail: [email protected] B. V. Kiranmayee e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_39
397
398
C. Suresh et al.
2 Literature Survey Jindal et al. [1] proposed a model which uses body mass index, basal metabolic rate, resting metabolic rate, body fat percentage, protein recovery dietary allowance, and it predicts whether the person is obese or not. It makes use of machine learning techniques to obtain reliable obesity estimates in a range of scenarios. Sun et al. [2] included data collection and processing, statistical analysis, and model fitting. It used descriptive statistics, cross-category analysis, model-based analysis, and model-based prediction for developing the obesity prediction model. It developed a predictive model which evaluates children’s chances of adhering to one of four BMI categories five years later using currently measured determinants [sex, area of residence, age, body height, and body mass index (BMI)] and specifies the elevated risk group for possible implications using currently measured determinants. Al Kibria et al. [3] have used dataset consisting of both rural and urban areas. Race, age, gender, relationship status, education level, active contraceptive use, ethnicity, and spirituality were all reported by participants. BMI is also calculated. Based on all these factors, the overweight, underweight, and obesity are predicted. Singh and Tawfik [4] studied the risk of overweight and obesity in youngsters and found that it can be predicted early. They employed a machine learning algorithm to compute an individual’s BMI at three, five, seven, and eleven years of age. The machine learning model will predict if that individual is normal or at risk at the age of 14. Luhar et al. [5] have forecasted the risk of overweight and obesity levels of an Indian individual of aged 20–69 years using age and sex to 2040. They have also used information about where they live like they live in rural or urban areas. Cervantes and Palacio [6] applied computational intelligence to estimate the obesity levels in students of age between 18 and 25 years. They have used decision trees, random forest, support vector machine, and k-means algorithm for the estimation. Thamrin et al. [7] applied logistic regression, classification and regression trees (CART), and Naïve Bayes to estimate the obesity in adults. Logistic regression was utilized as a machine learning model based on the accuracy ratings. It also addressed the problem of data imbalance in predicting obesity status. Molina et al. [8] employed data mining techniques to create an obesity prediction model. Logistic model tree, random forest, multi-layer perceptron, and support vector machine are among the classification methods employed. The logistic model tree was selected for prediction since it offers the highest precision.
3 Proposed System Generally, most of the people across the world are facing issues with obesity. Even though it is the major concern for health, people were not interested to know their
37 Obesity Prediction Based on Daily Lifestyle Habits …
399
Fig. 1 System architecture for obesity prediction
obesity levels. In recent studies, there were some solutions for this obesity problem which were calculated based on BMI that depends only on weight and height of an individual to know their obesity levels. But there are many diseases that occurred due to obesity which was not predicted earlier. Based on the previous solutions, the implemented solution predicts the obesity levels with diseases along with the remedies they need to take. In this paper, there are 22 factors that determine the obese levels which include unhealthy diet, inactivity, age, gender, lack of sleep, family inheritance, drinking/smoking habits, basal metabolic rate (BMR), resting metabolic rate (RMR), body fat percentage (BFP), protein recommended dietary allowances (RDA) that gives better results. A solution is implemented in order to overcome from this problem which includes two phases. 1) 2)
Designing a ml model Filling the form (Fig. 1).
3.1 Phase 1 (Designing a ML Model) Step 1: Importing Dataset. The data contains the values of columns including age, gender, height, weight, obesity of parents, smoking habits, consumption of vegetables, water intake, physical activity, consumption of alcohol that are used to predict obesity, and is stored in a csv file. And the data must be imported into a data frame for further computations. Pandas, which is a Python library, provides a function read_csv; using this function, the data in the csv can be imported into the data frame. The dataset is derived from an article, and it consists of 2111 records with 22 fields.
400
C. Suresh et al.
Fig. 2 T-SNE plot, 2-D representation of the dataset using T-SNE
Visualization of data using T-SNE plot. Figure 2 represents a t-distributed stochastic neighbor embedding (T-SNE) which is used to convert multi-dimensional data to two-dimensional (2-D) data. Here, “NObeyesdad” is the target variable. Step 2: Data preprocessing. The data that is previously imported may contain null values, strings, outliers. So, the data must be properly preprocessed before applying any machine learning model. Label encoding. Generally, most of the data will be in strings and numbers. But machine can only understand numerical values, so the strings in the data must be converted into numerical values instead of totally removing them. To make this possible, label encoding is used. For example, if there is a gender column, then usually data is stored as M, F. This function converts all the M’s to 0 and all the F’s to 1 or vice versa and similarly for all other columns having character values. Handling null values (missing values). After converting all the column values to the numerical, check for any missing values in the data. If there are very few missing values, then delete the rows having null values and proceed with further process. In the other case, they must be filled with either mean, median or mode of their respective columns, and median is most preferred. Outlier detection and removal. A observation in a random sampling that departs substantially from other values is called an outlier. In certain ways, this concept defers to the analyst in determining what constitutes aberrant behavior. Here, boxplots are used to detect outliers. Figure 3 shows the outlier ends for each and every column. Boxplot. Boxplot is used to display how the data is distributed graphically. It looks like a box attached with two strings at both the ends. At the center of the box, the median of the data is represented (Q2), the left apex of the box is the 25th percentile (Q1), and the right apex of the box is the 75th percentile (Q3). (Q3 − Q1) is called the interquartile range or IQR. Then, the left extreme is calculated as Q1 − 1.5 * IQR. The right extreme is calculated as Q3 + 1.5 * IQR. Any point in the data to which boxplot is drawn staying in the range of left extreme and right extreme is picked and used for further processing. And the other points that are less than left extreme and
37 Obesity Prediction Based on Daily Lifestyle Habits …
401
Fig. 3 Outlier values of dataset
greater than right extreme are considered as outliers and are removed from the rest of the data (Fig. 4). Boxplots from the data which shows outliers are present in the dataset. Figure 5 represents a boxplot which visualizes how the data is distributed for the column “BMI.” Here, BMI column has no outliers. Figure 6 represents a boxplot which visualizes how the data is distributed for the column “NCP.” Here, NCP column has many outliers. Target variable separation. The target variable (prediction value column) column must be separated from the rest of the columns to fit the data into a machine learning model. Figure 7 represents a correlation matrix which shows correlation coefficients between each column. Each box in the table shows how two columns are correlated. The above correlation matrix summarizes data. Fig. 4 Boxplot
Fig. 5 Outlier boxplot for BMI
402
C. Suresh et al.
Fig. 6 Outlier boxplot for NCP
Fig. 7 Correlation matrix
Step 3: Splitting data and Applying Machine Learning model. Splitting data into training and testing data. The data must be partitioned into train data (70% of the total data) and test data (30% of whole data) to test the accuracy of the applied machine learning model. Then applying the machine learning model on the train data and test the accuracy of the ML mode using test data. Applying ML model. Random forest classifier is one of the best supervised machine learning algorithms. It splits the data into multiple subsets and applies decision tree algorithm individually to each subset. All the results obtained from those decision tree algorithms are collected, and mean (or) mode is calculated depending on the type of target variable. Now, this ML model is put into the back end of the Web site using Django.
37 Obesity Prediction Based on Daily Lifestyle Habits …
403
Fig. 8 Decision tree plot based on dataset
3.2 Phase 2 (Filling the Form) This step is about filling the form which consists of factors that lead to obesity of a person. The factors are age, gender, height, weight, obesity of parents, smoking habits, consumption of vegetables, water intake, physical activity, consumption of alcohol. After filling the form, click on the submit button where it displays the type of obesity level of that person. The result includes obesity level and risks that may occur due to obesity, and if the cursor is hovered over each risk, it will display remedies for that risk. Figure 8 represents a decision tree which is a tree like structure, in which any particular internal node denotes a test on a column, a branch between two internal nodes represents a result of the test, and every leaf node represents a class value.
3.3 Comparative Study On the basis of their accuracy scores, a comparative analysis of numerous machine learning algorithms such as random forest, support vector machine (SVM), decision tree, and K-nearest neighbor (KNN) has been conducted. Table 1 shows a detailed comparison of several machine learning algorithms focusing on their accuracy scores. From the table, the conclusion is that random forest is more efficient than other algorithms based on the accuracy ratings. Figure 9 shows a bar graph which plots various machine learning algorithms on X-axis and their accuracy values on Y-axis. Random Forest. Random forest uses decision tree algorithm. It is a parallel process unlike decision tree which is a step process. Here, it splits the whole dataset
404 Table 1 Comparative study of various machine learning algorithms
C. Suresh et al. ML algorithms
Accuracy score (%)
K-nearest neighbor
78.97
Support vector machine (SVM)
96.21
Decision tree
96.96
Random forest
98.48
Fig. 9 Comparative study of various machine learning algorithms based on their accuracy scores
into multiple data subsets in which the number of rows of each data subset must be same but not necessarily the number of columns is same. For each data subset, random forest algorithm applies decision tree algorithm individually and extracts the results from each decision tree. It calculates the final output by mode of all the outputs from the decision tree. It is called bootstrapping because the data is split and then the result is calculated by aggregation. Here, the root node selection in the random forest and decision tree algorithms is done with the parameter Gini index. Greater the Gini index value, the higher the level of the attribute in the random forest and decision tree.
4 Experimental Analysis The home page of the Web site is depicted in Fig. 10, which describes information about obesity and various risks associated with being underweight, overweight, or obese. Figure 11 shows the survey page which has to be filled by the user. This form contains 21 fields which are used to predict various obesity levels. Figure 12 shows the result which is predicted by the machine learning model based on the input given by the user. It also displays possible risks (diseases) associated with respective to the result.
37 Obesity Prediction Based on Daily Lifestyle Habits …
Fig. 10 Home page
Fig. 11 Survey form
Fig. 12 Result page
405
406
C. Suresh et al.
Fig. 13 Result page with remedies
Figure 13 shows the remedies if the cursor is hovered above the risk. Detailed description about the remedies can be accessed by clicking on the hyperlink.
5 Conclusion Obesity is considered as the most common disease in the modern world. As such, it is important for the individuals to know which category of weight class they belong to so that they can be healthy. So, in this study, individual’s weight is classified into seven categories based on 21 factors. Based on the results of the classification, the risks and diseases that individuals may have are predicted. Based on the diseases, the remedies are also provided. Early recognition of the risks associated with obesity and overweight will be helpful to prevent many diseases.
References 1. K. Jindal, N. Baliyan, P.S. Rana, Obesity prediction using ensemble machine learning approaches, in Recent Findings in Intelligent Computing Techniques, ed. by P. Sa, S. Bakshi, I. Hatzilygeroudis, M. Sahoo. Advances in Intelligent Systems and Computing, vol. 708 (Springer, Singapore, 2018), pp. 355–362 2. Y. Sun, Y. Xing, J. Liu, Five-year change in body mass index category of childhood and the establishment of an obesity prediction model. Sci. Rep. 10, 10309 (2020)
37 Obesity Prediction Based on Daily Lifestyle Habits …
407
3. G. Al Kibria, K. Swasey, M.Z. Hasan, Prevalence and factors associated with underweight, overweight and obesity among women of reproductive age in India. Glob. Health Res. Policy 4, 24 (2019) 4. B. Singh, H. Tawfik, Machine learning approach for the early prediction of the risk of overweight and obesity in young people, in Computational Science—ICCS 2020, ed. by V. Krzhizhanovskaya. Lecture Notes in Computer Science, vol. 12140 (Springer, Cham, 2020), pp. 523–535 5. S. Luhar, I.M. Timæus, R. Jones, S. Cunningham, S.A. Patel, S. Kinra, L. Clarke, R. Houben, Forecasting the prevalence of overweight and obesity in India to 2040. PLoS ONE (2020) 6. R.C. Cervantes, U.M. Palacio, Estimation of obesity levels based on computational intelligence. Inform. Med. Unlocked 21, 100472 (2020) 7. S.A. Thamrin, D.S. Arsyad, H. Kuswanto, A. Lawi, S. Nasir, Predicting obesity in adults using machine learning techniques: an analysis of Indonesian basic health research 2018. Front. Nutrition 8, 669155 (2021) 8. E.D. Molina, D.K. Kevin, M.P. Fabio, Classification and features selection method for obesity level prediction. J. Theor. Appl. Inf. Technol. 99(11), 1992–8645 (2021)
Chapter 38
A Brief Analysis of Fault-Tolerant Ripple Carry Adders with a Design for Reliable Approximate Adders Asma Iqbal and K. Manjunatha Chari
1 Introduction Adders are the central unit around which major computations are carried out in microprocessors. The efficiency and robustness of this unit is a determining factor in assessing the processor performance. Digital processing is required in every digital operation that is done, and as the world moves towards digitization, the importance of adders cannot be undermined. A two-pronged effort has been unfolding to improve the reliability and efficiency of adders. Firstly, to improve the reliability of adders numerous fault-tolerant schemes have been proposed [1–4]. Secondly, approximate adders [5–8] have been used to reduce the area, power and delay so as to have an optimal operation. In this work, it is proposed to combine the two so as to implement a reliable approximate adder, study the advantages and infer the optimizations possible. This is accomplished through the implementation of triple modular redundancy (TMR) [9] and partial triple modular redundancy (PTMR) [10] with a word size of 16 and 32 bits and studies the advantages of PTMR (with P = 3, 4 and 5). In this paper, a brief background on adders, fault-tolerant adders and approximate adders is included in Sect. 2. Comparison between TMR and PTMR is discussed in Sect. 3. Section 4 gives the details of the proposed reliable approximate adders. Results and discussion are included in Sect. 5.
A. Iqbal (B) ECE Department, DCET, Hyderabad, India K. Manjunatha Chari ECE Department, GITAM University, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_40
409
410
A. Iqbal and K. Manjunatha Chari
2 Background The efficiency and reliability of adders are of relevance now more than ever as digitization gains greater traction with each passing day. There is an attempt to automate at every possibility, and every person moves around with computation power in their pocket. As mentioned earlier, we need to understand the role faulttolerant adders and approximate adders can possibly have in this.
2.1 Adder Designs Different designs for adders have been proposed in recent times. Power requirement, area of the circuit and delay optimization are the main criteria which constrain and motivate the circuit design. With these as the guiding factors, different adders have been suggested and improvised. Minor modifications have led to an improvement in some measurable outcomes, and these enhancements have cumulatively augmented the performance of processors. Reversible adders have been designed which have the advantage of reduced gate count and elimination of irrelevant outputs [11]. Prefix adders with different structures have been implemented optimizing area power and delay [12]. Carry-select adders [13] are a type of adders in which the sum at each bit position is computed assuming a carry input of both 0 and 1. A multiplexer selects the correct output depending on the actual carry input. Adder designs have been extensively studied, and a complete understanding of the various circuit designs is available.
2.2 Fault-Tolerant Ripple Carry Adders Ripple carry adders (RCAs) are the simplest form of parallel adders. They are constructed by connecting ‘n’ single-bit full adders to obtain an n-bit adder. To achieve fault tolerance in these adders, a number of designs have been proposed. Triple modular redundancy is the simplest form and uses the scheme first proposed by Von Neumann [9]. The n-bit adder is used three times, and the output is voted upon using a majority voter. This design is capable of masking the effect of faults, and its major disadvantage is the area overhead involved. Time-shared TMR (TSTMR) [14] and quadruple time redundancy (QTR) [15] were proposed to reduce the area overhead at the cost of increased delay. A more efficient scheme for incorporating fault tolerance is the partial TMR scheme in which the authors have triplicated only a few most significant bits on the basis of the weight attached to each bit position. For example, in a 16-bit adder the MSB has a weight of 215 = 32,768, whereas the LSB has a weight of 20 = 1 which is quite insignificant. When the word size increases, the disparity is also more. The authors analysed the outputs obtained by triplicating
38 A Brief Analysis of Fault-Tolerant Ripple …
411
3, 4 and 5 of the most significant bits, and a thorough evaluation done shows that the best operation is obtained when P = 4, where P represents the number of the most significant bits that are triplicated, for word lengths of 16 and 32 bits. These conclusions are utilized in the work presented in this paper.
2.3 Approximate Adders Image processing, audio processing, wireless communication, data mining, cryptography and machine learning are examples of intensive computing operations, and these are capable of a certain amount of tolerance to computation error. This has given an impetus to the design of approximate adders, and many designs have been proposed in recent times. The adders are designed such that they have a k-bit precise part that computes the sum bits accurately and (n-k)-bit imprecise adder. A generally used scheme employs simple OR gates to compute the sum in the least significant inaccurate sub-adder with the accurate adder using simple k full adders for the most significant accurate sub-adder. The least significant bits are thus independent of each other as carry is not propagated in the inaccurate sub-adder. The carry input in the approximate adder of Fig. 1a [16] is kept low, whereas in Fig. 1b [17] it is equal to Ak1.Bk-1. The reliable approximate adder designed in this paper uses the approximate adder of Fig. 1a [17]. This approximate adder scheme has been used by the authors to implement imprecise multipliers, and their efficacy has been shown by using them to implement bidimensional filters (low-pass, sharpening filter and edge detection unit) for
Fig. 1 Approximate n-bit adders
412
A. Iqbal and K. Manjunatha Chari
image filtering. The results obtained concluded that the error incorporated was within tolerable limits with the benefit of improved performance (area, delay and power).
3 Fault-Tolerant RCA In this work, we have limited ourselves to triple modular redundancy (TMR) and partial TMR (PTMR). TMR is the simplest form of fault tolerance by a design that masks the effect of faults. Also, the design can easily be tweaked to have in the field error detection by adding external outputs to the individual modules. PTMR is a modified form of TMR as explained in Sect. 38.2.
3.1 Triple Modular Redundancy In this method, the three RCA adder modules are used to mask the effect of faults. The basic block diagram for a 16-bit adder is shown in Fig. 2. The same can be extended to adders of different word sizes. It is obvious from the figure that the hardware redundancy is quite high reaching to 400–600% in some cases of FPGA implementation.
Fig. 2 Block diagram of 16-bit TMR
38 A Brief Analysis of Fault-Tolerant Ripple …
413
Fig. 3 Block diagram for PTMR
3.2 Partial Triple Modular Redundancy Among the many ideas put forth to reduce the hardware overhead inherent in the design of TMR, the PTMR is quite efficient as it reduces the area requirement without additional delay being incorporated. The basic block diagram is shown in Fig. 3 where P most significant bits are computed thrice. These outputs are then sent to a voter with a width of P bits. The voter is a simple majority voter, and it generates the P most significant sum bits. The least significant P bits are computed using simple full adders. The carry generated from the simplex adder module forms the input carry for all the three modules being used for the P MSBs of the sum. The hardware overhead comparison of these two designs is done by comparing the transistor count of each design. The transistor count for the basic logic blocks was presented by Akbar and Lee [18]. The transistor count for the logic blocks used in TMR and PTMR is included in Table 1. It can be observed from Table 2 that there is a substantial improvement in the area requirement if PTMR is used for reliable operation. For P = 4, the operation of PTMR was as good as or better than TMR. This was the conclusions drawn by Table 1 Transistor count for logic blocks
Logic block
Number of gates
Transistor count
OR gate
01
06
Full adder
09
54
Voter
05
30
Table 2 TMR and PTMR comparison (P = 3, 4 and 5) 16 bits
% Reduction
32 bits
% Reduction
TMR
2677
PTMR, P = 3
1243
100
5745
100
54
2172
PTMR, P = 4
62
1381
48
2310
60
PTMR, P = 5
1519
43
2448
57
414
A. Iqbal and K. Manjunatha Chari
Parhi et al. [10] on the basis of fault resilience checks done. Hence, if PTMR is used with P = 4, there is a reduction of about 50% in the transistor count which would translate to area and power improvement in the implemented design.
4 Reliable Approximate Adder The approximate adder [17] is combined with fault-tolerant schemes proposed in Refs. [9, 10] to obtain reliable approximate adders. The adder is designed as shown in Fig. 4. The most significant precise sub-adder is obtained using a simple RCA, whereas the least significant imprecise sub-adder operation is obtained using simple OR operation. The carry generated at each bit position in the imprecise sub-adder is not carried over to the next significant bit. This results in area, delay and power optimizations. To improve the reliability of the operation fault tolerance mechanism has been included in these adders. Including fault tolerance in these adders improves the yield which has become increasingly relevant as the feature size keeps decreasing and also improves the performance of the system in the useful life period. In the TMR design of reliable approximate adders, both the precise sub-adder and the imprecise sub-adder are triplicated with voters used for generating the final sum and output carry at each bit position. In the PTMR, the triplication is limited to the precise sub-adder unit with the voter generating the sum bits and the final carry output in the precise part. In the imprecise sub-adder, the sum is directly obtained from the outputs of the OR gates. The detailed schematics for fault-tolerant approximate adder using TMR and PTMR are shown in Figs. 5 and 6, respectively. The blocks used in these implementations are the full adder represented as FA, simple OR gate represented as OR and a majority voter represented as VOTE. The inputs applied are a0-aw-1 and b0-bw-1 where w represents the word size (16 or 32 bits).
Fig. 4 Proposed reliable approximate adder
38 A Brief Analysis of Fault-Tolerant Ripple …
415
Fig. 5 Fault-tolerant approximate adder based on TMR
In this work, the precise sub-adder is 4-bit wide for word lengths of 16 and 32 bits with the remaining bits included in the imprecise sub-adder. This is in concurrence with the results obtained for approximate adders [17] and PTMR [10].
5 Results and Discussion TMR is one of the most popular approaches for fault tolerance and gives a reliable output in case of a single-module failure. The area overhead is more than 300% at the least with a little delay overhead due to the voters. Many applications today do not require precise operations especially in the field of media processing, data mining, etc., which are capable of tolerating some amount of quality loss. Approximate computing is a solution proposed to trade tolerable loss in quality with improvement in area and power. If this is combined with reliability, then we
416
A. Iqbal and K. Manjunatha Chari
Fig. 6 Fault-tolerant approximate adder based on PTMR
can have a class of optimized robust adders. The advantages of this design will be manifest in area, delay and power requirements. As the ripple carry is limited to P most significant bits, the delay is considerably reduced and the reduced hardware translates to lower power requirements as is expected in an approximate adder (Table 3). It can be observed that the 32-bit reliable approximate adder based on PTMR has a transistor count that is less than that of the non-redundant ripple carry adder and almost the same in case of 16 bits. There is a marked improvement in the transistor count of the reliable approximate adder based on TMR. A basic analysis of reliable approximate adders has been carried out in this work, and an initial analysis is carried out on the basis of the hardware requirement for the implementation of the proposed design. The inferences drawn are quite dependable as the hardware in the design is standard logic blocks (OR gate, full adders and majority voter circuits).
38 A Brief Analysis of Fault-Tolerant Ripple …
417
Table 3 Transistor count comparisons for reliable approximate adders versus precise adders 16 bits
32 bits
Architecture
OR gate
Full adder
Voter
RCA
–
16
–
Transistor count 864
Overhead (%) 100
TMR
–
48
17
2677
310
Aprx_TMR
36
12
17
1374
159
PTMR
–
24
05
1381
160
Aprx_PTMR
12
12
05
870
RCA
–
32
–
1728
100.1 100
TMR
–
96
33
5745
332
Aprx_TMR
84
12
33
2142
124
PTMR
–
40
05
2310
134
Aprx_PTMR
28
12
05
966
56
References 1. R. Zimmermann, Binary adder architectures for cell-based VLSI and their synthesis, Ph.D. Thesis, Swiss Federal Institute of Technology, Zurich, 1997 2. J.M. Rabaey, A. Chandrakasan, B. Nikolic, in Digital Integrated Circuits, a Design Perspective (2nd Prentice Hall, Englewood Cliffs, NJ, 2002) 3. J. Uyemura, in CMOS Logic Circuit Design (1999). ISBN 0-7923-8452-0 4. N. Weste, K. Eshragian, in Principles of CMOS VLSI Design: A Systems Perspective (AddisonWesley, 1993) 5. P. Balasubramanian, D. Maskell, Hardware efficient approximate adder design, in TENCON 2018–2018 IEEE Region 10 Conference (IEEE, October, 2018), pp. 0806–0810 6. A. Dalloo, A. Najafi, A. Garcia-Ortiz, Systematic design of an approximate adder: the optimized lower part constant-or adder. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(8), 1595– 1599 (2018) 7. J. Lee, H. Seo, Y. Kim, Y. Kim, Approximate adder design with simplified lower-part approximation. IEICE Electron. Exp., 17–20200218 (2020) 8. V. Gupta, D. Mohapatra, S.P. Park, A. Raghunathan, K. Roy, IMPACT: IMPrecise adders for low-power approximate computing, in IEEE/ACM International Symposium on Low Power Electronics and Design (IEEE, August, 2011), pp. 409–414 9. J. von Neumann, Probabilistic logics and synthesis of reliable organisms from unreliable components, in Automata Studies, ed. by C.E. Shannon, J. McCarthy (Princeton University Press, 1956), pp. 43–98 10. R. Parhi, C.H. Kim, K.K. Parhi, Fault-tolerant ripple-carry binary adder using partial triple modular redundancy (PTMR), in 2015 IEEE International Symposium on Circuits and Systems (ISCAS) (IEEE, 2015), pp. 41–44 11. M.C. Li, R.G. Zhou, A novel reversible carry-selected adder with low latency. Int. J. Electron. 103(7) (2016) 12. D. Raj, S.K. Adyanthaya, J. Praveen, R.R. Rao, Design and Implementation of different types of efficient parallel prefix adders. in National Conference on Advanced Innovation in Engineering and Technology, vol. 3, no. 1 (April, 2015) 13. A. Tyagi, A reduced-area scheme for carry-select adders. IEEE Trans. Comput. 42(10), 1163– 1170 (1993). https://doi.org/10.1109/12.257703 14. Y.M. Hsu, Concurrent error correcting arithmetic processors, Ph.D. dissertation, University of Texas at Austin, 1995
418
A. Iqbal and K. Manjunatha Chari
15. W.J. Townsend, J.A. Abraham, E.E. Swartzlander, Quadruple time redundancy adders [error correcting adder]. in Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems (IEEE, November, 2003), pp. 250–256 16. H.R. Mahdiani, A. Ahmadi, S.M. Fakhraie, C. Lucas, Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans. Circuits Syst. I Regul. Pap. 57(4), 850–862 (2009) 17. P. Albicocco, G.C. Cardarilli, A. Nannarelli, M. Petricca, M. Re, Imprecise arithmetic for low power image processing, in 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR) (IEEE, November, 2012), pp. 983–987 18. M.A. Akbar, J.A. Lee, Self-repairing adder using fault localization. Microelectron. Reliability 54, 1413–1451 (2014)
Chapter 39
Performance Assessment Using Supervised Machine Learning Algorithms of Opinion Mining on Social Media Dataset M. Susmitha and R. Laxmi Pranitha
1 Introduction Today, social media changed the complete world. This is a computer-based technology that is the best platform for people to share ideas, thoughts, and information virtually. Twitter is one such site to share one’s views, thoughts, and expressions. In today’s environment, the necessity to gather comments from these social networking sites and form conclusions about what people like and dislike has become the most significant part. The dataset contains tweets written by people from different regions and a class label demonstrating whether the comment is positive or negative. We gathered around 31,000 tweets from Twitter, which were then automatically divided into two groups: POSITIVE VIEWS: Tweets that can express a point of view or think in appreciation. NEGATIVE VIEWS: Tweets that can express a point of view or think that are critical of a certain issue. Finally, we utilize the RBFSVM tool to train and test the correctness of the system that tells up to what extent our system accomplishes opinion mining. Support vector machine (SVM) is a supervised learning algorithm. Support vector classification (SVC) is used for classification problems. SVM classifies data by locating the hyperplane that separates the classes plotted in n-dimensional space (Fig. 1). Kernels are essentially a clever means of adding more features to data in the hopes of making it linearly separable. They take advantage of some magical mathematical M. Susmitha (B) · R. L. Pranitha Department of IT, VNRVJIET, Hyderabad, India e-mail: [email protected] R. L. Pranitha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_41
419
420
M. Susmitha and R. L. Pranitha
Fig. 1 Optimal separating hyperplane between two classes
Fig. 2 Kernel format
principles that allow us to get the same outcomes as if we had added the features ourselves, without slowing down the model. We used the most famous Gaussian RBF kernel which separates nonlinearly separable data. Advantages of RBF are 1. 2.
It reduces the processing time and processing power. It makes the algorithm simple in terms of computation.
RBF kernel function value depends on the distance from some point or from origin (Fig. 2).
2 Related Work There has been a lot of work done on sentiment analysis, and here are a few examples:
39 Performance Assessment Using Supervised Machine …
(A)
(B)
(C)
(D)
421
In terms of accuracy, precision, and recall, the sentiment classification performance was assessed [1]. For sentiment classification of hotel and movie reviews, they compared Naïve Bayes with K-NN. On Hadoop framework, well-described sentiment analysis utilizing Naïve Bayes and supplement Naïve Bayes classifier methods [2]. For Naïve Bayes and its complement, the accuracy value tendency is the same for all sizes of training samples. The complement Naïve Bayes method had the best overall accuracy. In Ref. [3], Naïve Bayes and logistic regression were discussed. The acquired findings and their performance are compared using three parameters: accuracy, precision, and computation time. The sentiment analysis on tweets is done using the multinomial logistic regression method. Feature extraction is used to convert the data supplied into a feature set, which is then validated and evaluated [4].
3 Methodology To train and classify SVM, we used the Colab interface and Python programming. To classify the tweets, we utilized a dataset of 3 MB in total (Fig. 3).
3.1 Explanation of Pseudo-Code Input—Labeled Dataset. Output—Positive or Negative tweet. Step-1: Preprocessing of tweets To clean the tweets, preprocessing is the first and most important step in any data mining process. It entails converting raw data into a format that NLP algorithms can interpret. The raw data is frequently inconsistent, fragmentary, and rife with errors. Data preprocessing will solve such problems. This strategy will aid in the achievement of better results. Below two techniques are important in this preprocessing. Tokenization: It is the method of dividing a stream of tweets into tokens, which can be words, special characters, symbols, or sentences. Word tokenize and sent tokenize are built-in methods in the NLTK library that translate tweets into words and sentences, respectively. Lemmatization: It is used to group variant forms of the same word. Example: study, studied, studies lemma is ‘study.’ This will reduce the number of words that are common in a tweet (reduces complexity). Common steps involved in preprocessing.
422
M. Susmitha and R. L. Pranitha
Fig. 3 Flow chart of methodology
(A) (B) (C) (D) (E)
Convert all tweets into lowercase. Word tokenization. Remove the stop words. Remove non-alphabetic text. Word lemmatization.
Step-2: Pre-processed tweets After applying the preprocessing steps to each tweet, replace the tweets with final words. Step-3: Prepare data sets for training and testing The data will be divided into two sections: training and testing. The test dataset will be used to predict the class label and discover the accuracy, while the training dataset will be used to train the model. The dataset train test split from the sklearn package is used for splitting. The training dataset will have 70% of the corpus remaining, whereas the test dataset will have 30%.
39 Performance Assessment Using Supervised Machine …
423
Step-4: TF-IDF Vectorization TF-IDF is an acronym for term frequency-inverse document frequency. This is a common method for translating the text into a meaningful numerical representation that can subsequently be used to train machine learning algorithms. Term-Frequency This function calculates and normalizes the frequency of a word in a text. The normalized TF’s final value will be between 0 and 1. Each document and word has a unique TF value (Fig. 4). Inverse Document Frequency Document frequency is the number of times the word t appears in the document collection N. If a term occurs in a document at least once, it counts as one occurrence; it has no means of knowing how many times a word appears in a document (Fig. 5). Df is the inverse of our purpose, which is to figure out how informative a phrase is. So, we use inverse document frequency. For most often occurring words, such as stop words, the IDF will be quite low. Finally, it provides just what we require (Figs. 6 and 7).
Fig. 4 Formula for term frequency
Fig. 5 Formula for document frequency
Fig. 6 Formula for inverse document frequency
Fig. 7 The product value of TF and IDF gives TF-IDF score
424
M. Susmitha and R. L. Pranitha
Step-5: SVM to predict the outcome The RBFSVM tool is used to process the sparse matrix acquired by TF-IDF, which offers the accuracy rate for testing the classification, which is then trained and predicted to be studied.
4 Experimental Results This section explains how supervised learning techniques like logistic regression, Naïve Bayes, and support vector machine were used to get experimental findings from a dataset (Tables 1, 2, and 3). ROC curve A receiver operating characteristic curve which is also call as ROC curve depicts how well a classification model works at various levels of categorization. Two parameters are shown on this curve are (Figs. 8, 9, and 10): As shown in Fig. 11: • The best precision belongs to Naïve Bayes of 99.98%. The Naïve Bayes algorithm returned more relevant results than irrelevant ones. • The best recall belongs to support vector machine of 96.57%. This algorithm returned most of the relevant results. Table 1 Confusion matrix of logistic regression
Pos.
Neg.
Total
Pos.
8839
90
8929
Neg.
448
212
660
Total
9287
302
Pos.
Neg.
Total
Table 2 Confusion matrix of Naïve Bayes Pos.
8921
1
8922
Neg.
580
87
667
Total
9501
82
Pos.
Neg.
Total
Table 3 Confusion matrix of support vector machine Pos.
8893
29
8922
Neg.
315
352
667
Total
9208
381
39 Performance Assessment Using Supervised Machine … Fig. 8 ROC curve for TF/IDF vectorizer of logistic regression
Fig. 9 ROC curve for TF/IDF vectorizer of Naive Bayes
Fig. 10 ROC curve for TF/IDF vectorizer of support vector machine
425
426
M. Susmitha and R. L. Pranitha
Fig. 11 Performance of logistic regression, Naïve Bayes and support vector machine on a dataset
• The f-measure is more for support vector machine with 98.1%. • The best accuracy is for support vector machine with a percentage of 96.41%.
5 Conclusion and Future Scope The purpose of this study is to assess the accuracy, precision, f-measure, and recall of sentiment analysis. In this paper, we compared three supervised learning algorithms. They are logistic regression, Naïve Bayes, and support vector machine for the sentiment classification of tweets on Twitter. In this study, the support vector machine produced the highest accuracy value. The SVM approach giving an accuracy of 96% and performed better than the other two algorithms. Hence, it decreases the manual work that must be done to analyze and conclude comments expressed on Twitter. This work might be expanded to any of the commonly visited social networks that have many evaluations from various users. In the future, we will try to use some unsupervised deep learning techniques on bulk datasets which improve the algorithm accuracy and optimize it. By implementing some word representation techniques and more precise preprocessing methods, we will implement a new algorithm.
References 1. L. Dey, S. Chakraborty, A. Biswas, B. Bose, S. Tiwari, Sentiment analysis of review datasets using Naive Bayes’ and K-NN classifier. Int. J. Inf. Eng. Electron. Bus. (2016) 2. B. Seref, E. Bostanci, Sentiment analysis using Naive Bayes and complement Naive Bayes classifier algorithms on Hadoop framework, in 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) 3. A. Prabhat, V. Khullar, Sentiment classification on big data using Naïve Bayes and logistic regression, in 2017 International Conference on Computer Communication and Informatics (ICCCI)
39 Performance Assessment Using Supervised Machine …
427
4. W.P. Ramadhan, S.T.M.T. Astri Novianty, S.T.M.T. Casi Setianingsih, Sentiment analysis using multinomial logistic regression, in 2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC) 5. L. Dey, S. Chakraborty, Canonical PSO based-means clustering approach for real datasets. Int. Scholar. Res. Notices, Hindawi Publishing Corporation 2014, 1–11 (2014) 6. R. Dey, S. Chakraborty, Convex-hull & DBSCAN clustering to predict future weather, in 6th International IEEE Conference and Workshop on Computing and Communication, Canada 2015, pp.1–8
Chapter 40
Enhancing English Proficiency Using NLP A. Brahmananda Reddy, P. Vaishnavi, M. Jahnavi, G. Sameeksha, and K. Sandhya
1 Introduction English proficiency is a key factor for employment success and advancement and for opening doors to economic opportunity. Proficiency is best practiced through reading and writing. Non-native speakers of English must practice the sounds of English on how they are accented or stressed in their usage. Vocabulary knowledge is the single most important area of language proficiency. This paper briefs about a level detection model to sense the proficiency level by checking the grammatical errors, sentence formation and vocabulary. Here, a discussion about a natural language processing-based solution is used to level the text and produce an enhanced text. Essay writing has become a crucial part of the student evaluation process. Several organizations, such as Educational Testing Service (ETS), have a necessity to evaluate the essay in order to understand the level of student’s metacognitive skills to evaluate the writing skills of students in their examinations. Because of the large number of students participating in these examinations, grading essays for each and everyone is a laborious task [1]. Vocabulary acquisition is a process inherent to human language learning that determines the rate at which an individual becomes familiarized with the lexicon of a given language. Word recognition, however, is described as a series of linguistic sub-processes that establishes one’s capability of identifying and comprehending individual words in a text [2]. Our motive is to refresh the students to acquire proficiency, fluency and accuracy in English. Motivating students to develop and enrich their English language skills. Used in education field to access the students. Adjust the paragraph, after detecting the state of the learner. A. Brahmananda Reddy (B) · P. Vaishnavi · M. Jahnavi · G. Sameeksha · K. Sandhya Department of CSE, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_42
429
430
A. Brahmananda Reddy et al.
2 Existing System Acquiring proficiency in English language has become an important criterion to keep oneself afloat with proficiency skills required for employment opportunities and to build interpersonal skills so that one could face the world with confidence. As there are many existing systems, there the main criteria would be performing English language-level test, where it does a series of quick tests that will give us a rough idea of our level of English for a cent percentage proficiency. There are many existing systems out there in the market, but the ones which are very popular and in demand are Grammarly and VocabGrabber. Grammarly: It communicates with impact, ensures everything you write comes across the way you intend, presents your best self every time you type, makes your writing clear and engaging, eliminates grammar errors and easily improves any text. VocabGrabber: This analyzes any text you are interested in, generating lists of the most useful vocabulary words and showing you how those words are used in the context. VocabGrabber is a teaching tool that allows teachers and students to input any text, and the tool will pull the vocabulary words. The words are color coded and can be sorted by category (i.e., geography, people, social studies, arts and literature, math and science). The tool will create word webs of related words and word clouds. The list can be sorted in alphabetical order, by relevance to the text, the number of times each word appears in the text or the familiarity of the words. Although Grammarly and VocabGrabber are providing services for guiding the individuals to correct their grammatical mistakes and have a better vocabulary use, often individuals need to go to different platforms to utilize different features like grammar checking, synonym suggestion, automated essay scoring and voice to text conversion. Currently, there is not a system which provides all these diverse features on a single platform.
3 Literature Review A.
Neural approach to automated essay scoring
Kaveh Taghipour and Hwee Tou Ng published a paper where they used neural approach to know the automated essay scoring. AES systems give output as a realvalued number; hence, they have used machine learning algorithms in order to know the relationship between essays and corresponding reference scores. They have used ASAP dataset for verification and quadratic weighted kappa as an evaluation metric. Basically, here they have used recurrent neural networks where information is processed in different convolution layers which resulted in the performance exceeding by 5.6% in terms of quadratic weighted kappa. Apart from this, neural network fails to learn the task properly in the absence of mean over time [3].
40 Enhancing English Proficiency Using NLP
B.
431
User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning and information retention
Gondy Leroy and David Kaushak published this paper. This paper presents fully automated version of the lexical simplification algorithm, which identifies difficult terms and generates list of easier alternatives based on information extracted from dictionaries and other databases [4]. Here, they discussed mainly about the text familiarity and provided easy words for the users by utilizing Google Web Corpus. In order to measure actual difficulty, they have used 3 metrics like 5 multiple-choice questions to measure understanding. Here the drawback is that on an average 52% with original document. C.
Lexical simplification for non-native speakers
Paetzold and Specia had published this paper. Their main aim was to replace complex words with simpler words. To achieve their objective, they have performed few steps like complex word identification, substitution generation, substitution selection and substitution ranking. In complex word identification phase, they firstly observe the dataset and run a module which shows variation between verbs and nouns. Later, they run another module related to spelling correction of LEXenstein to correct any misspelled words among the candidates present in dataset. In substitution generation phase, they used context-aware word embeddings where training the corpus would be done in first place. Later Parts of Speech tagging would take place, so as to avoid sparsely, they generalized all tags related to nouns, verbs, adjectives and adverbs to N, V, J and R, respectively. After this, the model would be trained with certain tools like Word2vec or GloVe [5]. In substitution selection, in order to do boundary ranking in unsupervised fashion they have used the Robbins-Sturgeon hypothesis. In substitution ranking, in order to train corpus, they applied 5-gram language models over SubIMDB and four other corpora like Wikipedia, Simple Wiki, Brown and SUBTLEX. Here, they have used NNSeval dataset. So, accuracy, precision and changed proportions are used as the evaluation metrics for that dataset.
4 Proposed System Traditional systems are providing certain features, but here we are improvising it in much more efficient way. In this paper, we will explain how we built a text summarizer that gives us the English proficiency level of the person based on the text input taken through keyboard or microphone or WAV file format. We will check the grammatical level, sentence formation and vocabulary level to determine the proficiency level and give it as output. Natural language processing (NLP) and some analytical methods are used.
432
A. Brahmananda Reddy et al.
Fig. 1 English language proficiency (ELP) system
Here, we are trying to discuss a platform where all the diverse features will be available in a single platform such that users need not switch to different platforms in order to finish or achieve any particular task. System architecture is as specified (Fig. 1).
5 Methodology Input of Text The input for the module is taken as text and voice-based into the system. When the input is given as an audio snippet, the voice is converted to text using PyAudio and Wave. PyAudio is a Python library which is used to record the audio and playing it on different platforms like Windows, Mac, etc. Wave module is a Python library which acts as an interface to the audio WAV format. It can write audio data in raw format to a file and read the attributes of a WAV file. We can also provide input in WAV format by using Speech_Recognization and pydub libraries. These libraries split large audio files into chunks and apply speech recognization on each of these
40 Enhancing English Proficiency Using NLP
433
chunks. Here we use Google recognize for the transcription of the audio chunk and convert them into text format. Level Detection Level detection uses certain features to evaluate the paragraph that is taken as input [6]. This evaluation is done on an ML model which is trained with 8 datasets that is graded based on features like lexical diversity, word count, average length, structure and organization of the paragraph. It looks at the lexical diversity which checks the variety of words used in the text, word frequency that refers to how often a word occurs in the paragraph, lexical overlap, semantic overlap, syntactic complexity that contains a higher number of words before the main verb, syntactic similarity that refers to the uniformity and consistency of syntactic constructions in the text at the clause, phrase and word level. Random forest regression algorithm uses the decision trees for the outcome. Each tree specifies each component used in score prediction like lexical diversity, semantic overlap, syntactic complexity, etc., which is converted into numeric format and makes decision for next nodes in the tree leaf node specifies the final numeric entity of component. This way we collect the values from each decision tree to formulate decision score by summation of the values obtained and finding its percentage which shows the prediction score of the text provided. The model is computed by the Python programming language. The Scikit_Learn Python library comes with many machine learning algorithms for regression [7]. We have used the Scikit_Learn library to implement, train, test and evaluate the model. Grammar Checker English language contains a wide range of grammar rules which are crucial for evaluating the paragraph. The sentences in the paragraph are parsed to produce productions that make use of various syntactic categories. A syntactic category consists of finding the noun phrase, verb phase, prepositional phrase, determiner, noun, verb preposition. Grammar checking is done by initially creating a context-free grammar and the sentence formation with correct grammar. Whenever a sentence is provided, the parser of NLTK checks whether the sentence satisfies the context-free grammar or not. It forms a top-down parse tree which shows the sentence formation; if it satisfies the given context-free grammar (CFG), then the sentence is formed without any grammatical mistakes. Context-free grammar is a formal grammar and helps in generating the language which satisfies formal language. Here, we create the CFG which satisfies English grammar. This defines the sentence formation which justifies grammar such as parts of speech, subject–object location, etc. (Fig. 2). Whenever a sentence is framed, it categorizes its POS initially and then checks with the declared CFG which initially starts forming a parse tree as shown (Fig. 3) which is a top-down parse tree. S -> NP VP
434
A. Brahmananda Reddy et al.
Fig. 2 Context-free grammar (CFG)
Fig. 3 Parse tree of a sentence of CFG
S S S S
-> -> -> ->
’I’ VP PP ’I found’ Det NP ’I found a fish’ P NP ’I found a fish in water’
This way it checks for the formation of sentence; if this reaches the end by forming grammar, then sentence is accepted or else it does not satisfy the CFG which in term does not satisfy the grammar leading to grammatical error in sentence. So, a parser processes given input sentences according to the productions of grammar and builds one or more structures that conform to a grammar [8]. In this way, the grammatical errors in the text are pointed out. Synonym Suggestion The sentences in the text undergo tokenization and lemmatization. Each sentence in the paragraph is broken down into smaller units of the sentence in NLP which is called tokenization. Tokenization gives all the tokens, but we would not need to suggest the synonyms for prepositions, connecting words, etc. In order to take care of this issue, we make use of stop word list (Fig. 4). Stop words are the words which are commonly used in the language like is, the, an, etc. Removing these stop words helps one to focus on the main word for which we could suggest synonyms that in turn enhances the efficiency of the text.
40 Enhancing English Proficiency Using NLP
435
Fig. 4 Lemmatized words count after removing stop words
Later, we lemmatize and find the root word with and further process it through WorldNet with WordNetLemmatizer and Porter Stemmer libraries in Python [9] to find the meaning and synonyms of the word which we could suggest for betterment.
6 Results Upon providing input, we suggest the needed changes to be incorporated in the text (Figs. 5 and 6). Thus, after getting the suggestions to enhance the text, we could incorporate the changes in the text, again test it for level detection and see the level that has improved. Random forest regression uses for grammar checking have an absolute mean error of 1.22, thereby providing the best results (Fig. 7). The overall score of the essay level is evaluated for cent percent efficiency. The level is graded based on grammatical errors, sentence formation and vocabulary. In case inputs are given in noise environment, the overall score of the input gets decreased because the system does not understand the input and suggests the most possible and adequate words or phrases that can be incorporated for better score.
436
A. Brahmananda Reddy et al.
Fig. 5 Proficiency and errors in the given text input
Fig. 6 Synonym suggestion
Fig. 7 Proficiency of enhanced text
7 Conclusion The prototype takes the input of a paragraph through text or voice-based and detects the level, since we provide voice-based, it gets easier for non-native speakers and people who are ineffable. This makes it more usable to people even in a messy environment. Later, it checks for grammatical errors. After which it tokenizes the paragraph and eliminate the articles, prepositions, conjunctions and so on by using a stop word list
40 Enhancing English Proficiency Using NLP
437
and lemmatize the words and find an apt word that could enhance the level through WorldNet. Further after understanding the errors and the area that has a scope to improve, English learners would add the changes to the content which would produce the enhanced text and later which is sent for recheck in regard to level detection to see if the enhanced paragraph has the desired level of proficiency. In conclusion, the ELP system produces an enhanced paragraph that guides the English learner to efficiently develop and enhance their English language skills. The system architecture is built in such a way that it is flexible and adaptable to the changes in any stage, so it is easier to the admin to incorporate any substitute in the future.
8 Future Scope In the future, the present paper could be enhanced in many different ways. This concept could be evolved to be applied for incorporation of semantic approaches. Paraphrasing with the help of MySql and Node JS could use to automatically rewrite the text after suggestions without human intervention. For grammar checker, we could incorporate diverse grammatical rules and make it more efficient and reliable. The type of the paragraphs could be diverse; it could be narrative, compare and contrast, etc. The suggestion about the content type could be inculcated. The present project is around only English language, but it could be extended to different languages like Hindi, Tamil, Telugu, etc [10]. In addition to this, our model can be used in real-life scenarios like in education sectors, in examination evaluation, etc. We can also make a dynamic version which can classify the text into its respective field like business, education, political, etc., providing the respective related terminology for better version of the input [11].
References 1. H. Ghanta, Automated essay evaluation using natural language processing and machine learning (2019) 2. G. Paetzold, L. Specia, Unsupervised lexical simplification for non-native speakers. Proc. AAAI Conf. Artif. Intell. 30(1) (2016) 3. K. Taghipour, H.T. Ng, A neural approach to automated essay scoring, in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016) 4. G. Leroy et al., User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. J. Med. Internet Res. 15(7), e144 (2013) 5. G.H. Paetzold, Lexical simplification for non-native English speakers. Diss., University of Sheffield, 2016 6. H. Chen, B. He,Automated essay scoring by maximizing human-machine agreement. in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013)
438
A. Brahmananda Reddy et al.
7. S.Y. Yoon, S. Bhat, K. Zechner,Vocabulary profile as a measure of vocabulary sophistication, in Proceedings of the Seventh Workshop on Building Educational Applications Using NLP (2012) 8. W. Wagner, Steven bird, ewan klein and edward loper: natural language processing with python, analyzing text with the natural language toolkit. Lang. Resour. Eval. 44(4), 421–424 (2010) 9. J. Perkins, in Python 3 Text Processing with NLTK 3 Cookbook (Packt Publishing Ltd., 2014) 10. A. Özçift et al., Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish. Automatika 62(2), 226–238 (2021) 11. Y. Gal, Z. Ghahramani,A theoretically grounded application of dropout in recurrent neural networks. Adv. Neural Inf. Process. Syst. 29, 1019–1027 (2016)
Chapter 41
Best Practices and Strategy for the Migration of Service-Oriented Architecture-Based Applications to Microservices Architecture Vinay Raj and K. Srinivasa Reddy
1 Introduction Distributed systems have evolved very quickly, beginning with the monolithic style of designing applications. Monolithic application has a large codebase, deployed as a single unit, and the application components are highly coupled. Monolithic architecture has a limitation in the size and complexity of the application. The increase in the complexity of enterprise applications, business requirements, and the need to design distributed applications have led to the evolution of SOA [1]. Service-oriented architecture (SOA) has been widely used in designing large enterprise applications in the last two decades. It evolved primarily to address the scalability and deployment issues in monolithic systems. SOA is a style of designing applications where all the components in the system are designed as services. A service is a piece of reusable software code that performs a variety of business functions, which can be simple or complicated depending on the needs of the business. SOA is used widely in the integration of multiple software components using the enterprise service bus (ESB) as the communication channel [2]. ESB is the backbone of SOA, which helps in providing the features of the middleware system. ESB acts as a mediator between service requestor and provider and provides a high-performance and scalability platform. SOA gained more popularity with the evolution of Web services, which is the popular implementation of SOA concepts. Web services are Internet-based services that may be built, accessed, and discovered using communication protocols like XML-based SOAP and WSDL. Web services use HTTP and REST protocols for the transfer of messages through the Internet. Service provider, service consumer, V. Raj (B) · K. Srinivasa Reddy BVRIT Hyderabad College of Engineering for Women, Telangana, India e-mail: [email protected] K. Srinivasa Reddy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_43
439
440
V. Raj and K. Srinivasa Reddy
and service registry are the three basic components of the Web services architecture. A single Web service can be used by multiple clients simultaneously and can be easily deployed. Despite the fact that SOA is in high demand for application development, it has a few design and deployment problems [3]. Because of its dependence on other services and close connectivity with the ESB, updating a single service demands redeploying multiple components. Deploying multiple services leads SOA to the monolithic style of deployment, and it impacts the business [4]. Furthermore, as the number of dynamically changing business requirements grows, few SOA services are becoming monolithic in size, making the application complex and difficult to maintain. Scaling such monolithic applications is a bottleneck as SOA follows centralized governance [5]. Services that are overloaded can be scaled horizontally by making multiple copies of the same service, but the hardware cost increases. Further, Web services use complex and heavyweight protocols such as SOAP to exchange messages between the services. To clearly understand the concepts of monolithic, SOA, and microservices architectures, a diagram is presented in Fig. 1. The application designed as a monolith has a single large unit of code and coming to SOA, the large codebase is partitioned into coarse-grained services. In microservices, the services in SOA are further partitioned to form fine-grained services. Microservices evolved as a new architectural approach that uses cloud-based containers for deployment to overcome these issues in existing architectures [6]. It is a style of designing applications where each service is a small, loosely coupled, scalable, and reusable service that can be designed and deployed independently [7]. Each service should perform only one task and should have its own database and independent deployment architecture. Microservices uses communication protocols like HTTP/REST and JSON for data exchange between the services. Unlike SOA, microservices can be deployed independently as there is no centralized governance and no dependency on middleware technologies. It is effortless to scale on-demand microservices with the use of cloud-based containers [8]. Microservices architecture suits well with the DevOps style as every task is to be broken into small units, and complete SDLC is to be done independently [9]. DevOps and agile methodologies require the fast design of applications and deployment to production. SOA
Monolith
service1
service2
Microservices M S2 M S1 M S3
Application servicen
M S4 M Sn
Fig. 1 Understanding of monolithic, SOA, and microservices architectures
41 Best Practices and Strategy for the Migration of Service …
441
Because of the numerous advantages of microservices design, software architects have begun moving their traditional applications to microservices architecture [9]. Many organizations, like Netflix, Ebay, and Twitter, have begun using this design in their new applications [10]. As microservices has emerged recently, there is a huge demand in both industry and academia to explore the tools, technologies, and programming languages used in this architecture. However, because they are unaware of the benefits and drawbacks of employing microservices, several software architects are unsure whether to move to this new design or not. The need for best practices, success stories, and pitfalls related to migration to microservices is highlighted in the literature [11]. Therefore, in this paper, we share our experience in migrating SOAbased applications to microservices architecture. We focus on the migration strategy and best practices to avoid challenges which occur during the migration process. The remaining part of the paper is organized as follows. The benefits of migration to microservices are discussed in Sect. 2. The migration strategy to be followed for migrating applications is presented in Sect. 3, and best practices to avoid the challenges during migration are discussed in Sect. 4. The experimental study is presented in Sect. 5. The future research directions are presented in Sects. 6 and 7 concludes the paper.
2 Background It is very important to know about the benefits of migrating existing legacy applications to microservices architecture. In this section, we present some very important business and technical benefits of using microservices architecture in designing large enterprise and Web applications.
2.1 Technical Benefits of Microservices Some of the technical benefits of microservices architecture include: • Scalability: Microservices can scale automatically when the load increases with the use of cloud containers [12]. Without affecting the remaining services, only on-demand services could be scaled. This feature of auto-scaling in microservices improves the efficiency of the application. • Independent Deployment: Each microservices can be deployed independently without affecting the other components of the application. This enables quick updations and changes in the application and reduces the time to change [13]. • Polyglot: By using microservices, the developers of the application can choose any programming language of their choice and they can use any technology framework for design, testing and deployment of the services [14]. As microservices use
442
V. Raj and K. Srinivasa Reddy
REST APIs for communication, the exchange of information between different technologies is not a major concern. • Compatible with Agile and DevOps: Microservices architecture suits well with the principles of both Agile and DevOps methodology as they both needs the applications to be built as small, independent, and cohesive applications. Managing the microservices is also very easy, and hence, it suits well for these methodologies [15]. The principles of continuous integration and continuous delivery suit well with microservices architecture. • Less Complex: The application designed using microservices are less complex and easy to understand. As the application is broken down into small set of services, adding, or changing new requirements is very quick and easy. If any developer leaves the organization, it does not impact the new developer accessing the same microservices because of its smaller size.
2.2 Business Benefits of Microservices Microservices have benefited in achieving the business goals of many software firms. Few benefits of migrating the applications to microservices include: • Focus on business requirements: Each microservice performs only a single task, and they can be reused in other business requirements also. Each microservice is focused solely on achieving business goals. Quick development and deployment of the new requirements or change requirements improve the business agility [16]. • Quick evolution: Applications designed using microservices evolve very quickly. The development and changes in existing applications can be done without stopping other components of the application [17]. This improves the faster delivery of the business services. • Organizational alignment: The teams using microservices can be easily managed as every developer is assigned one or few microservices, and he/she is only responsible person for complete life cycle of the particular services [18]. It becomes easy for the cross team coordination and management can easily handle multiple teams. • Reduced costs: The cost of changing or adding new business requirements is significantly very less in microservices. The costs of technology and infrastructure required for maintaining microservices are cheaper as it uses cloud features [19]. • Reduced Time-to-market: Since all the microservices are designed and deployed independently, parallel tasks improves the speed of the project and applications are delivered to customers in a very short period [20].
41 Best Practices and Strategy for the Migration of Service …
443
3 Migration Strategy In this section, we present a formal strategy to migrate the applications from one architecture to another. Let S be the application designed using architecture A and S be the application to be designed using the new architecture B. To migate the application S from A to form the new application S of B, the proposed strategy helps in easy migration. Step 1: Before migrating any existing application to a new architecture, pros and cons of migrating to a new style should be identified. For this, consider an existing application, understand all the software artifacts, and design it using the new architectural style. After this, we need to consider the quality of service (QoS) parameters such as performance, scalability, and maintenance as criteria for evaluating and comparing both the applications. If the application designed with new architecture exhibits better QoS values, then we can move further in the migration process. If not, we can take a decision on the migration considering the expert’s judgment. Step 2: Once it is clear that the application built using new style exhibits better QoS values, there arises the need to study the legacy application and its components. Based on the principles of new architecture, a new or existing approach for composing or decomposing the components of the old application should be adapted. If there is a need to propose a new technique for migration, considering all the challenges of migration is an important challenge. A detailed investigation of the existing approach is very important at this phase. Step 3: Effort estimation helps software architects in the proper execution and management of the project. Effective estimation helps in proper scheduling of the software engineering activities. It has to be done during the early stage of the application design as it gives insights into the effort and cost required to complete the application. Moreover, estimating the accurate effort required for the migration process is a challenging task. Underestimation and overestimation of the effort required may lead to serious project management issues. The effort required for migration should be estimated and the risk involved in the migration should be analyzed for a clear picture of the effort. We may either use existing or propose new techniques for estimating the effort. Step 4: Many challenges may arise during or after the actual migration of the legacy application to the new architecture. The application can be migrated after completing the steps 1–3, and after migration, we need to identify all the possible challenges encountered during the migration process. Patterns are the solutions for most recurring problems and help the software associates to solve the challenges. Hence, it is desirable to propose design patterns for the recurring issues faced during the actual migration. Step 5: Sanity after the deployment of the application is very important as it ensures that the application functions according to the business requirement. The performance of the application should also be monitored with real-time data as it helps in
444
V. Raj and K. Srinivasa Reddy Start
Compare A and B
No
B has better Qos than A
Yes End
Propose Extraction Approach
Estimate Effort
patterns for recurring problems
Post Migration Sanity
End
Fig. 2 Migration strategy
checking the stability of the application. It is also important to monitor and detect bugs post migration, and if there exists any such common bugs, we need to apply the patterns proposed in step 4 of the approach (Fig. 2). The above-discussed proposed approach is not a standard approach of migration. We have presented the approach with our experience in migrating an SOA-based application to microservices architecture. We successfully migrated few SOA-based applications by following the proposed migration strategy.
4 Migration Best Practices This section presents the best practices that will help software architects to migrate the existing legacy systems to the new architecture.
41 Best Practices and Strategy for the Migration of Service …
445
Loose Coupling: Loose coupling is defined as the dependency each component of the application has on other components. It is always suggestible to have all the software components to be loosely coupled. A loosely coupled application can have many benefits such as scalability, maintenance, updations, and handling failures can become easy. When the applications are migrated from one architecture to another, the primary goal at each step of migration should be to check whether the components are loosely coupled or not. If the artifacts are loosely coupled, then it becomes easy to design, test, deploy, and maintain the applications. Logs for monitoring: One standard way of migrating an application is to iteratively migrate parts of the application to the new style such that at the end of the iteration, the complete application is migrated to the new style. As the application should work to achieve the business goals, based on the load and SLA, the modules are migrated. During this process, it is always suggestible to have log files in the newly migrated components such that any bugs or failures can be tested easily. With the updation of one component, issues may arise either in the new component or in the existing ones. Hence, these logs help us in monitoring and detecting the bugs in the application. Phased effort estimation: Effort estimation helps the software architects in proper planning and migration of the application. However, the effort estimated should include the effort required in all the phases of the software development life cycle (SDLC) but not limited to development effort. Most of the effort estimation techniques proposed in the literature focus only on development effort. The migration process also involves planning, design, and testing and hence, proper mechanism should be followed while estimating the effort. Security: This is one of the very important aspects of today’s software applications. Security has to be implemented at every layer of the application architecture. The details related to data, credentials, and permissions to users have to be taken proper care. Adapting security mechanisms such as ABAC helps us in preventing the security issues.
5 Experimental Study We consider a Web-based application presented in [21] for demonstrating the proposed migration approach. We consider the SOA-based application of vehicle management system and apply the migration strategy to extract and generate the microservices. The Web-based application has eight services designed using the principles of SOA. To compare both the applications, we designed the microservices-based application from scratch and used both the applications for comparison. We consider the metrics proposed in [22] for comparing both the applications. The details of the services in both SOA and microservices applications are presented in Table 1. The notations of the services in service graphs are also represented in the table. Applying the service graph generation and microservices extraction approaches proposed in [23], we generate the candidate microservices from SOA-based appli-
446
V. Raj and K. Srinivasa Reddy
Table 1 Details of services of both SOA- and microservices-based applications Notation in SOA SOA services Microservices Notation in MSA s1 s2 s3 s4 s5
Config service Part service Product service Compare service Incentives and pricing service
s6
Dealer and inventory service
s7
Lead service
s8
User interface client
Config service Part service Product service Compare service Incentives service
ms1 ms2 ms3 ms4 ms5
Pricing service Dealer service
ms6 ms7
Dealer locator service Inventory service Get-A-quote service Lead processor service User interface client
ms8 ms9 ms10 ms11 ms12
Table 2 Details of services of SOA-based Web application Service # Interacting services CS value s1 s2 s3 s4 s5 s6 s7 s8
2, 3, 4, 5, 6, 8 1, 4, 5, 6, 8 1, 4, 5, 6, 8 1, 2, 3, 8 1, 2, 3, 8 1, 2, 3, 7, 8 6, 8 1, 2, 3, 4, 5, 6, 7
6 5 5 4 4 5 2 7
RCS value 0.75 0.62 0.62 0.5 0.5 0.62 0.25 0.87
cation. The extracted microservices are compared in terms of loose coupling. The details of the services in each application and their dependencies along with metrics values are presented in Tables 2 and 3. The results of comparing SOA and microservices are displayed as a graph with CS and RCS values, as illustrated in Fig. 3, with coupling of services (CS) values on the X -axis and relative coupling of services (RCS) values on the Y -axis. The graph shows that the coupling values of microservices-based applications are lower than those of SOA-based applications. Additionally, the comparison between both the systems in terms of other quality of service (QoS) parameters is presented in our work [22]. With the results, it motivates us to migrate SOA-based application to microservices architecture.
41 Best Practices and Strategy for the Migration of Service …
447
Table 3 Details of services of microservices-based Web application Service # Interacting services CS value ms1 ms2 ms3 ms4 ms5 ms6 ms7 ms8 ms9 ms10 ms11 ms12
2, 3, 4, 5, 6, 7, 9, 10, 12 1, 4, 5, 6, 10, 12 1, 4, 5, 6, 10, 12 1, 2, 3, 10, 12 1, 2, 3, 6, 12 1, 2, 3, 5, 10, 12 1, 9, 10, 11, 12 11, 12 1, 7, 10, 12 1, 2, 3, 4, 6, 7, 9, 12 7, 8, 12 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
Fig. 3 Coupling values comparison
RCS value
9 6 6 5 5 6 5 2 4 8 3 11
0.75 0.5 0.5 0.41 0.41 0.5 0.41 0.16 0.33 0.67 0.25 0.91
1 Web Services Microservices
0.9 0.8 0.7
RCS
0.6 0.5 0.4 0.3 0.2 0.1 0
0
2
4
6
8
10
CS
6 Future Directions With the drastic advances in Information Technology (IT), many new software architectures, technologies, frameworks etc., are being proposed and used widely. One of the major drawbacks of IT industry is not understanding the complete details of the newly proposed technologies and architectures. With the hype on the use of new things, all companies are trying to adapt them to gain the business benefits. Because of this, the real meaning and usage of the new things are missing, and the applications tend toward their old style or technology. Since microservices is a new architectural
448
V. Raj and K. Srinivasa Reddy
style, there is a need for analyzing the complete pros and cons of this new style. Also, there are no standard frameworks for migration of an application from one architecture to another. All the approaches proposed in literature are customized according to the architectures. Hence, defining such standard frameworks and guidelines for migration can be proposed.
7 Conclusion Many software companies have begun using microservices architecture to create new apps or migrate existing legacy systems to this new style as a result of the evolution of microservices architecture. The process of migrating an application from one architectural style to another is a challenging task. With our experience in migrating SOA-based applications to microservices, we have proposed a formal migration strategy which can be applied for any architecture migration. Also, we have presented best practices which helps in avoiding the challenges during migration process. This work assists software architects in migrating applications from one architecture to another in an efficient manner. We have demonstrated the proposed migration strategy with a standard case study application. The future research directions are also presented.
References 1. T. Cerny, M.J. Donahoo, J. Pechanec, Disambiguation and comparison of SOA, microservices and self-contained systems, in Proceedings of the International Conference on Research in Adaptive and Convergent Systems (2017), pp. 228–235 2. J. Yin, H. Chen, S. Deng, Z. Wu, C. Pu, A dependable ESB framework for service integration. IEEE Int. Comput. 13(2), 26–34 (2009) 3. T. Salah, M.J. Zemerly, C.Y. Yeun, M. Al-Qutayri, Y. Al-Hammadi, The evolution of distributed systems towards microservices architecture, in 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST) (IEEE, 2016), pp. 318–325 4. T. Cerny, M.J. Donahoo, M. Trnka, Contextual understanding of microservice architecture: current and future directions. ACM SIGAPP Appl. Comput. Rev. 17(4), 29–45 (2018) 5. Z. Xiao, I. Wijegunaratne, X. Qiang, Reflections on SOA and microservices, in 2016 4th International Conference on Enterprise Systems (ES) (IEEE, 2016), pp. 60–67 6. J. Thönes, Microservices. IEEE Softw. 32(1), 116 (2015) 7. D. Taibi, V. Lenarduzzi, C. Pahl, A. Janes, Microservices in agile software development: a workshop-based study into issues, advantages, and disadvantages, in Proceedings of the XP2017 Scientific Workshops (2017), pp. 1–5 8. C.V. Raghavendran, A. Patil, G.N. Satish, M. Shanmukhi, B. Madhuravani, Challenges and opportunities in extending cloud with Fog computing. Int. J. Eng. Technol. 7(4.39), 142–6 (2018) 9. D. Taibi, V. Lenarduzzi, C. Pahl, Processes, motivations, and issues for migrating to microservices architectures: an empirical investigation. IEEE Cloud Comput. 4(5), 22–32 (2017) 10. J. Soldani, D.A. Tamburri, W.J. Van Den Heuvel, The pains and gains of microservices: a systematic grey literature review. J. Syst. Softw. 1(146), 215–32 (2018)
41 Best Practices and Strategy for the Migration of Service …
449
11. A. Carrasco, B.V. Bladel, S. Demeyer, Migrating towards microservices: migration and architecture smells, in Proceedings of the 2nd International Workshop on Refactoring (2018), pp. 1–6 12. G. Toffetti, S. Brunner, M. Blöchlinger, F. Dudouet, A. Edmonds, An architecture for selfmanaging microservices, in Proceedings of the 1st International Workshop on Automated Incident Management in Cloud (2015), pp. 19–24 13. N. Dragoni, S. Giallorenzo, A.L. Lafuente, M. Mazzara, F. Montesi, R. Mustafin, L. Safina, Microservices: yesterday, today, and tomorrow, in Present and Ulterior Software Engineering (2017), pp. 195–216 14. V. Raj, S. Ravichandra, Microservices: a perfect SOA based solution for enterprise applications compared to web services, in 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT) (IEEE, 2018), pp. 1531– 1536 15. L. Chen, Microservices: architecting for continuous delivery and DevOps, in 2018 IEEE International Conference on Software Architecture (ICSA) (IEEE, 2018), pp. 39–397 16. W. Hasselbring, G. Steinacker, Microservice architectures for scalability, agility and reliability in e-commerce, in 2017 IEEE International Conference on Software Architecture Workshops (ICSAW) (IEEE, 2017), pp. 243–246 17. M. Jung, S. Móllering, P. Dalbhanjan, P. Chapman, C. Kassen, Microservices on AWS (Amazon Web Services Inc., New York, NY, USA, 2016) 18. I. Nadareishvili, R. Mitra, M. McLarty, M. Amundsen, Microservice Architecture: Aligning Principles, Practices, and Culture (O’Reilly Media, Inc., 2016) 19. Z. Li, Q. Chen, S. Xue, T. Ma, Y. Yang, Z. Song, M. Guo, Amoeba: QoS-awareness and reduced resource usage of microservices with serverless computing, in 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (IEEE, 2020), pp. 399–408 20. W. Luz, E. Agilar, M.C. de Oliveira, C.E. de Melo, G. Pinto, R. Bonifácio, An experience report on the adoption of microservices in three Brazilian government institutions, in Proceedings of the XXXII Brazilian Symposium on Software Engineering (2018), pp. 32–41 21. V. Raj, R. Sadam, Evaluation of SOA-based web services and microservices architecture using complexity metrics. SN Comput. Sci. 2(5), 1 (2021) 22. V. Raj, R. Sadam, Performance and complexity comparison of service oriented architecture and microservices architecture. Int. J. Commun. Netw. Distrib. Syst. 27(1), 100–117 (2021) 23. V. Raj, S. Ravichandra, A service graph-based extraction of microservices from monolith services of SOA, in Practice and Experience, Software (2021)
Chapter 42
Retinal Hemodynamics and Diabetes Mellitus Detection Through Deep Learning Ambika Shetkar, C. Kiran Mai, and C. Yamini
1 Introduction Diabetes mellitus, also known as diabetes, has spread over the globe. Diabetes is a long-term metabolic condition characterized by fluctuating blood glucose (BG) levels. It is caused by either insufficient insulin synthesis (type 1 diabetes, T1D) or the body’s inability to use the insulin it produces (type 2 diabetes, T2D) (type 2 diabetes, T2D). Diagnoses of T1D and T2D have increased, although T2D has increased more. T2D is a rising condition that puts a substantial pressure on healthcare systems, particularly in poor nations, accounting for 90–95% of all diabetes cases. T2DM-related risk factors include obesity, asthma, heart disease, and background of the family. When we compared the people who are without T2DM family history and individuals with a family tree history have a two- to threefold elevated chance of experiencing T2DM in any first-degree parent. A person who in a way internalizes the meaning of family tree history varies with according to his/her illness. A positive improvement in health behavior can benefit from individuals being conscious of their family background and examining their personal connection to a diabetes, compared to people who did not have a family background. An early detector and willing reckoner for the doctor to propose some prevention steps to curtail at an early stage. People with diabetes, their eyes tend to express at what stage of the disease they are in. When high blood sugar levels damage the blood vessels in the retina, this is known as diabetic retinopathy. Diabetic retinopathy affects up to 80% of people with diabetes who have had it for more than 20 years. At least 90% of new instances might be reduced with effective treatment and surveillance of the eyes. A. Shetkar (B) · C. K. Mai · C. Yamini VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India C. K. Mai e-mail: [email protected] PES University, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_44
451
452
A. Shetkar et al.
Fig. 1 Stages of diabetic retinopathy with retinal findings
Figure 1 shows us different stages of diabetes with their retinal findings and the differences between them. These finding will help us to extract the valuable features from the retinal images that in turn helps us to build the CNN model to predict the diabetes.
2 Related Work The goal of the paper [1] is diabetic retinopathy which is a type of retinopathy caused by diabetes that can cause vision loss. Fundus images of diabetic retinopathy are taken with the Remidio “Fundus on Phone” app on a mobile phone. AIDR (EyeArtTM) screening software was used to test the retinal pictures and to assess and validate the function of fundus photography in the detection of diabetic retinopathy (DR) and sight-threatening DR (STDR) using a smartphone app, as well as to compare the results to ophthalmologist grading. The paper [2] surveys papers and articles of years 2011(20%), 2012 (67%), 2013(13%) and tells us what type of preprocessing and what type of classifier is used. Most of the papers used green channel preprocessing and SVM classifier. The paper [3] used seven characteristics for analysis: family tree history, gender, age, BMI, blood pressure, diabetes length, and glucose levels. This paper used Naïve Bayes tree, c4.5 algorithm, and k-means clustering. Conclusions: The female patient with hypertension is the most important risk factor for retinopathy.
42 Retinal Hemodynamics and Diabetes Mellitus Detection …
453
In this paper [4], initially 15 attributes were used to the analysis but because of the less accuracy the author reviewed to 11 attributes and the final accuracy is 82.30 in Naive Bayes. The paper [5] used the following techniques logistic regression, SVM, random forest, and gradient boost. The main improvement of this paper from this previous paper is using the weighted ensemble model that improves the quality of the attributes that in turn will improve the accuracies. The paper [6] paper used a convolutional neural network (CNN) to differentiate smokers from non-smokers using retinal pictures to improve the awareness of shifting behavioral natures in the smoker’s retinas. A diabetes screening method yielded 165,104 pictures of retia containing “smoking”/“non-smoking” labels. In one of those two approaches, the photographs were “contrast-enhanced” or “skeletonized.“ The dataset was split 80/20 between training and test sets. Validation findings were 88.88% overall. The skeletonized model, on the other hand, produced findings of 63.63 and 65.60% specificity. This study [7] compares a deep learning (DLS) method that applies artificial intelligence to professionals who identify diabetes retinopathy and related eye diseases using retinal images from multiethnic diabetics. In DLS (71,896 images; 14,880 patients), 90.50% sensitivity and 91.60% specificity for visible diabetes retinopathy; total 100% sensitivity and 91.10% specificity for visionthreatening diabetes retinopathy; 96.40% sensitivity and 87.20% specificity for potential glaucoma; 93.20% sensitive specificity for possible glaucoma. The research work [8]. Deep learning was used to create a step-by-step procedure to detect DR using retinal fundus pictures. The neural network utilized in this study employs a function that initially combines nearby pixels with local characteristics before aggregating all of them into the best global features. In this study, the neural network employed is based on Szegedy et alInception-v3.’s architecture. The real-time data processing proposed in this research [9] used Apache Kafka as a streaming platform and Mongo DB to store the patient’s sensor data. The findings suggest that commercial versions of BLE-based sensors, as well as the suggested real-time data processing, are capable of monitoring diabetes patients’ vital signs. With multilayer perceptron, our suggested model achieves an accuracy of 77.083%. Paper [10] employed a method in which 300 households with 3921 members were interviewed for family history. There were 770 diabetics in the group. The data was statistically analyzed using the T-test and the chi-square test. The findings revealed that 37% of type 1 DM cases and 58% of type 2 DM individuals had a family history of the condition. In cases where there was a family history of diabetes, 92% of type 1 DM cases and 59% of type 2 DM patients had a family history of type 2 DM with a decline in age of onset in subsequent generations.
454
A. Shetkar et al.
3 Existing System For the past two decades, researchers have been attempting to predict DR using data mining and machine learning techniques. The majority of the articles used a variety of data mining and machine learning approaches for cardiovascular disease detection, including the 2D Gabor wavelet method, J48 algorithm, decision tree, and convolutional neural networks, with varying degrees of accuracy. In comparison to the present system, the suggested system uses the most sophisticated machine learning approach, CNN, to create a comprehensive model with the greatest accuracy of 97%. The model generated using this proposed model is afterward stored and reused for greater precision.
4 Proposed System Providing a system that takes in retinal fundus images to predict the diabetes and its stage helps to create a platform that suggests and helps us in our day-to-day activities to prevent the diabetes.
4.1 Working Flow Model The process is to predict the diabetic retinopathy. The dataset contains the complete retinal images of the five stages of the diabetics Fig. 2. Initially, preprocessing the dataset takes place with converting the images to the fixed 64 × 64 format and then that dataset is sent to the image classification technique. The image classification technique used in this framework is the convolutional neural network (CNN), which is a deep neural network. The outcome of this CNN is the prediction of five class labels, i.e., mild class label, no class label, Proliferative_DR class label, moderate class label, and severe class label. Furthermore, the framework will provide the guided suggestions for the people who got the mild, moderate, Proliferative_DR,
Fig. 2 Stages of diabetic retinopathy with features through retinal image
42 Retinal Hemodynamics and Diabetes Mellitus Detection …
455
Fig. 3 Block diagram of the proposed framework
and severe class label and precautionary measures and good habits for the NO class label (Fig. 3).
5 Design Methodology Decisions The decisions are made based on the predictions done in the proposed system, the decisions which tell the end user that if they have the diabetic retinopathy or not and if they have then at what stage, it is the proposed value to the end user. ML Task Input is the retinal color fundus photograph collected from the end user’s eyes. Output is to predict if the end user has the diabetic retinopathy or not. The ML task is to take the test data and predict the mild, moderate, no (not presence of diabetes), Proliferative_DR, and severe class based on the training data feeded. Value Propositions Our end users are the normal people, to whom we check if they have diabetic retinopathy. The main purpose of this proposed framework is to have a model which is used to predict the stage at which the person is suffering from the diabetic retinopathy. Finally classifying them if they have the diabetes or not and keeping them on mild, moderate, no (not presence of diabetes), Proliferative_DR, and severe class based on their retinal fundus color photograph.
456
A. Shetkar et al.
Collecting Data The new data comes from the people who show some symptoms of diabetic retinopathy. The new data is in the form of a color image of retinal fundus collected from the symptomatic people. Data sources: The data source for the training data is a very large amount of retinal color fundus photographs collected from many patients who already suffered from diabetic retinopathy in various stages of the diabetes. The data source for the training dataset is collected from Indian diabetes Retinopathy Diabetes (IDRID) images which consist of five different types of diabetes images. Making Predictions The prediction is made on the test data of the input, the input is sent to the CNN algorithm, the algorithm runs on the test images then predicts the outcome. The time taken by the system to features and make predictions is based on the algorithm run time which takes two minutes (based on the system). The predictions are made to see if an end user has a diabetes or not and if they have then at what stage. Offline Evaluation Offline evaluation is after predicting at what stage the diabetes is present then what course of action needs to be taken by the end user is suggested by the Web application. Offline evaluation is done by the professional doctors. Accuracy of the prediction is the metrics is used to evaluate the system before deployment. Features After collecting the retinal fundus color photograph from the end user, the features are extracted based on the particular image pixels where the changes take place when compared to the normal retinal image due to the different stages of the diabetes and according to those features the end users are categorized into different stages of the diabetes. Building Models The model builds itself as the new test data comes regularly from the end user; the model takes the test data to the training data and the training data gets updated, and the updated training dataset will be used to get more accuracy of the predictions made on the diabetes. Usually time taken for building the initial model takes more time for the CNN. Once the model is built, the framework will automatically store the trained model for future prediction. Live Evaluation and Monitoring After evaluation and monitoring, the analysis is made on the complete data to see different trends of the diabetes on the grand scale as well as on the individual scale.
42 Retinal Hemodynamics and Diabetes Mellitus Detection …
457
6 Implementation The implementation done on Python language with tkinter package for the Web application and Keras package for implementing CNN. To implement the proposed work, we have designed following modules. Upload dataset: using this module, we will upload diabetes dataset to application. Preprocess dataset: using this module, we will read all images and then perform. preprocessing such as resizing image, rotation and augmentation, etc. Train CNN: using this module, we will apply CNN algorithm on diabetes dataset to build CNN model to predict diabetes. Upload test image and predict diabetes: using this module, we will upload retina image and then CNN will predict presence of diabetes and its stage. Accuracy and loss graph: using this module, we will plot CNN accuracy and loss graph. While training CNN, we took 50 epoch/iterations and for each epoch we calculate CNN accuracy and loss.
6.1 Upload Dataset The dataset used is collected from Indian diabetic Retinopathy image Diabetes (IDRID) images which consists of five different types of diabetes images. Those five types of diabetes are mild, moderate, no (not presence of diabetes), Proliferative_DR, and severe.
6.2 Preprocess the Dataset In the preprocessing of the dataset, the following dataset images are converted to 64 × 64 dimensions that are comparable with the CNN algorithm to use. Along with the conversion, the images are augmented, noise is removed, and contrast is increased for good pixel intensity for the CNN to recognize.
6.3 Training with CNN The preprocessed images of 413 images of the dataset is fed to the CNN algorithm, and the trained model details are shown in Fig. 4.
458
A. Shetkar et al.
Fig. 4 Trained model architecture of CNN
6.4 Upload Test Image and Predict Diabetes Once the CNN model is created and stored, A new retinal test image can be given to check whether a person has diabetics or not and the model specifically tells us if diabetics is present then at which stage among the mild, moderate, Proliferative_DR, and severe. The test images taken in this model are 10 images of different class labels of mild, moderate, severe, proliferative DR, and no DR (Fig. 5).
Fig. 5 Uploading the test image and prediction
42 Retinal Hemodynamics and Diabetes Mellitus Detection …
459
Fig. 6 Graph of accuracy versus loss
6.5 Accuracy and Loss Graph The model created by the CNN gives the accuracy of 97.0944285392%. The accuracy is determined by the confusion matrix, with loss versus accuracy graph is shown below. In the above image, the x-axis represents epoch and the y-axis defines accuracy/loss values; in the graph, the blue line indicates loss and the orange line tells accuracy; and also we can see that as the epoch number increases, accuracy increases and loss decreases, indicating that an accurate CNN model is being built (Fig. 6).
7 Conclusion In summary, the proposed framework will take the retinal fundus image of the user and does the prediction of diabetics through image classification using CNN and will identify the persons who might get the diabetics. The accuracy provided by the proposed framework for predicting the diabetes is 97.0944285392. If there is a severe, moderate, or high risk of diabetes, it can be used to target therapies and favorably influence healthy habits without causing unnecessary expense or harm. If the research goals described here are adequately fulfilled, the framework will routinely gather data for detecting diabetes risk and then provide individualized prevention tactics and advice for those who want to reduce their chances of becoming diabetics at a young age.
460
A. Shetkar et al.
References 1. R. Raman, S. Srinivasan, S. Virmani, S. Siva Prasad, C. Rao, R. Raja Lakshmi, Fundus photograph-based deep learning algorithms in detecting diabetic retinopathy, in The Royal College of Ophthalmologists, 7 Oct 2018 2. A. Ahmad, A.B. Mansoor, R. Mumtaz, M. Khan, S.H. Mirza, Image processing and classification in diabetic retinopathy: a review, in 2014 5th European Workshop on Visual Information Processing (EUVIP), Paris (2014), pp. 1–6. https://doi.org/10.1109/EUVIP.2014.7018362 3. C. Fiarni, E.M. Sipayung, S. Maemunah, Analysis and prediction of diabetes complication disease using data mining algorithm. Procedia Computer Sci. 161, 449–457. ISSN 1877-0509. https://doi.org/10.1016/j.procs.2019.11.1 4. N. Sneha, T. Gangil, Analysis of diabetes mellitus for early prediction using optimal features selection. J. Big Data 6, 13 (2019). https://doi.org/10.1186/s40537-019-0175-6 5. A. Dinh, S. Miertschin, A. Young, S.D. Mohanty, A data-driven approach to predicting diabetes and cardiovascular disease with machine learning, BMC Med. Inform. Decis. Making 19, 211 (2019) 6. S. Bhuravane, E. Vaghef, S. Yang, S. Hill, G. Humphrey, N. Walker, D. Squirrell, Detection of smoking status from retinal images; a convolutional neural network study (Scientific Reports, 29 April 2019) 7. D.S.W. Ting, C.Y.L. Cheung, G.S.W. Tan, S. Sivaprasad, Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes (American medical Association, 12 Dec 2017) 8. V. Gulshan, L. Peng, M. Coram, M.C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopalan, K. Widner, Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs (American medical Association, 12 Jan 2016) 9. G. Alfian, M. Syafrudin, M.F. Ijaz, M. Alex Syaekhon, N.L. Fitriyani, J. Rhee, A personalized healthcare monitoring system for diabetic patients by utilizing BLE-based sensors and real-time data processing. Sensors 18, 2183 10. S.S. Deo, S.D. Gore, D.N. Deobagkar, D.D. Deobagkar, Study of inheritance of diabetes mellitus in Western Indian population by pedigree analysis. J. Assoc. Phys. India 54, 441–444. PMID:16909690. 11. A. Shetkar, C. Kiran Mai, C. Yamini, Diabetic symptoms prediction through retinopathy, in Machine Learning Technologies and Applications (Springer, 2021), pp. 13–20 12. S. Samreen, K.M. Cherukuri, D. Venkatsai Goud, Predictive data analysis to identify heart anomalies. Int. J. Recent Technol. Eng. (IJRTE) 8(2), 2607–2611, 2277–3878 (2019) 13. C. Druva Manasa, K. Mai, Emotion and stress recognition through speech using machine learning models, in Learning and Analytics in Intelligent Systems (Springer, 2021), pp. 213–218
Chapter 43
Deep Neural Networks Model to Detection Glaucoma in Prima Phase Akhil Karnam, Himanshi Gidwani, Sachin Chirgaiya, and Deepak Sukheja
1 Introduction Glaucoma is an eye disorder in which the optic nerve, responsible for connecting the eye with the brain, gets damaged. It is often characterized by the presence of higher pressure in the intraocular chamber (IOP), which causes damage to the optical nerve. WHO says, it is the world’s second-largest cause regarding blindness. Glaucoma is popularly known as a ‘snitch sight robber’ because it typically has no signs before irreversible vision loss resembling a hushed slayer. In 1857 treatment that is experimental glaucoma ended up being accomplished the very first successful glaucoma treatment that ended up being postponed for many years since the reasons for glaucoma could not be adequately identified. Technology today still does not know the precise causes of glaucoma; however, with early diagnosis of glaucoma, complete losing eyesight with therapy may be prevented. Inside the 1970s, automatic diagnostic glaucoma was implemented. Later in the 1990s, the focus went on to glaucoma detection at a very early stage using structural improvements in optic disks to diagnose glaucoma before it affects eyesight. OCT was suggested in very early 2000 to gather the information that’s retinal interior levels so the infection may be identified early so it can be treated over time to stop loss of sight. A lot of potential patients with glaucoma globally into the coming years show escalation that ‘is alarming. Glaucoma is A. Karnam · D. Sukheja (B) VNRVjiet, Hyderabad, Telangana, India e-mail: [email protected] H. Gidwani IBM Canada, Markham, Canada e-mail: [email protected] S. Chirgaiya SVVV, Indore, MP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_45
461
462
A. Karnam et al.
identified by the visualization regarding the image that is retinal interior structure. The retina is weakened, either by one or both glaucomatous eyes. Glaucoma does not have any warning signs, and roughly 50% of people with glaucoma are unaware of the illness; hence, a thorough, dilated study of the eye is essential for early diagnosis and treatment that ‘is early. An eye fixed is looked over at least one time annually. Measuring the inner eye is suggested by the tonometry test. Most forms of glaucoma are identified as having higher than 20 mm Hg. Measures examined help diagnose as a type of glaucoma [1].
1.1 Process of Glaucoma Detection Glaucoma is identified using several tests, such as intraocular pressure measures and field vision examinations, as well as fundus and OCT imaging [2]. OCT provides an efficient way of visualizing and quantifying structures in the eye, namely the layer of retinal nerve fibers (RNFL), which changes with disease progression. This method involves an additional procedure, quantifying the RNFL in OCT images. Usually, these techniques often clean the input data in several ways, such as flipping all eye images into the same orientation (left or right) to minimize data variation, to boost classifier efficiency. The analysis of ophthalmic images done manually is a timeconsuming process. The efficiency often differs with the professional differences in expertise and capability. Unlike existing methods, CNN will automatically extract the features from fundus images, which ultimately can be used for training the classifier, in which the classifier can classify the images into their respective abnormalities.
2 Literature Survey The human eye is considered a sensor organ as the source of sight session and the retina of the eye is the most crucial part of the sensor organ. Glaucoma in a severe condition directly affects the retina. Tham [3], in a systematic review and metaanalysis, estimated that glaucoma will impact the population about 75.4 million by 2020 and expanding to 110.6 million by 2040, which draws our attention toward glaucoma. The main characteristic associated with glaucoma is, it does not show any signs or symptoms before reaching an advanced stage (Fig. 1). However, if it is detected too early, the vision loss due to glaucoma can be postponed. Glaucoma is also related to the production of intraocular pressure (IOP) in the eyes that is caused due to the blockage in the drainage flow of intraocular fluid. The optic nerve damage due to the increased IOP, which transmits the visual sensory information from the eye to the brain. The harm is caused by perception capacity of optic nerve fiber and the identification of artifacts that can lead to blindness. Authors in [4] considered the failure of the RNFL as a major parameter for finding the aberrant glaucoma. The feature extraction of fundus images has been obtained from
43 Deep Neural Networks Model to Detection Glaucoma …
463
Fig. 1 Estimated increase in glaucoma patients worldwide [3]
the gray-level matrix so that the ONH area is separated, and the rest of the retinal nerve fiber layer is divided into ISN’T-based subsectors. Researchers in [5] used support vector machines (SVM), which were efficient in the memory. Also, SVMs are remarked to be extremely efficient in handling high-dimensional spaces. Phase knowledge tends to be incorrect because the Radon transform does not retain it. In addition, some of the information gathered in the image is mislaid by used projections. RT further increases the difficulty in mathematics and introduces error, which has reached 98.8 and 95% classification accuracy. Researchers in [6] purpose feature representation mechanism and introduced a mix of unsupervised learning and supervised CDR regression learning applying on MFPPNet, using 3-dense connectivity blocks along with pyramid pooling, extraction. In [7, 8], authors use the dataset size of 934 images from 443 clinical samples used for validation that was carried out on an ORIGA dataset with 0.90 AUC and took statistical characteristics for classification, such as mean, entropy, and third moment. They also considered smoothness, uniformity, and standard deviation. They extracted these features to perform classification and fed them to KNN classifiers. The dataset they used includes 84 images with 95.24% accuracy. In [2], researchers suggested a device which can be implemented on hardware kits. It can be directly linked to the optical instruments and predicted along with the diagnostics. Using histogram equalization, they process the image, find the ROI, and ultimately estimate the cup-to-disk ratio, based on which prediction is based. No reference was made to the dataset or experimental results. Optical disk segmentation is achieved by red channel analysis, using vasculature observations present in the retinal region researchers in [9]. Clustering (SLIC) is suggested for various clustering techniques along with the edge detection filter Gabor. Predictions are made through CDR. Used dataset consists of 100 images with 96% value of an F-score. Parametric region used for inferences is carried by the curve (AUC) with 10 cross-fold validation through random forest classifier. It is derived from color fundus images, maps of retinal nerve fiber layer (RNFL) thickness, macular maps of GCC showing thickness, maps of disks showing divergence in maps of RNFL and GCC. They implemented 19-layered VGG19 CNN architecture. They have used a 357 OCT scan image dataset with an estimated accuracy of 0.958 in AUC cross-validation.
464
A. Karnam et al.
3 Problem Statement Glaucoma is an ailment that disproportionately affects many individuals who are under 40–60 years of age. Its worldwide average is 5%, which is slowly increasing. Early detection of glaucoma is especially important because it allows prompt care to avoid significant loss of the visual field. A practical vision loss test requires special equipment that is available only in hospitals in the area and is therefore unsuitable for screening. Sickness position (GLAUCOMA) in light of profound learning by using image planning on (FUNDUS, OCT).
4 Objectives The major objective of this research article is predicting the glaucoma in prima phase. To predict the glaucoma in prima phase: • Preprocess the glaucoma images and removal unwanted substances from the images. • Proposed CNN architecture to classify the glaucoma diseases, it is malignant or not.
5 Proposed System and Implementation In our system, deep learning architecture is implemented using CNN working at its core to automate glaucoma detection. This DCNN understands a hierarchical representation of images to distinguish patterns of diagnostic decisions between glaucoma and non-glaucoma. Previously various architectures of deep learning have been proposed, but this model that has been train using deep convolutional neural networks should be more accurate in diagnosing glaucoma than models previously published.
5.1 Process of Building DCNN Learning Model Common process to build any machine learning modeling is defined in Fig. 2, and process flow to implement convolutional neural networks is to classify the glaucoma and non-glaucoma images mentioned in Fig. 3. For fundus photography performed at extremely distinct levels, the dataset having various patient images across the different age groups is along with the gender like male and female. Studying these diverse data leads to changes in the dimensions of the pixels within the images that could produce pointless variations. To neutralize the
43 Deep Neural Networks Model to Detection Glaucoma …
465
Fig. 2 Process of building a machine learning model
Fig. 3 Process of building a machine learning model
image performed, preprocessing technique such as image cropping, color normalization, measurements rendering, and data increase on images has converted into hierarchical data format, followed by data augmentation and trained accordingly. For improvement in network localization capacity and reduce overfitting, increased the images in real time. Data increase creates more robust model with minor changes, thus preventing overfitting of the model. The dataset is arranged to 128 * 128 into pixels, allowing the retaining of convoluted features to be recognized.
5.2 Proposed Model The key part of model building is based on neural network using the Python language along with TensorFlow library in a GPU provided in Google Colab (Fig. 4).This model needs to understand from an experience and find out how to conduct tasks, such as predicting outcomes or classifying types from the various formats of data sources provided to it, and using deep learning techniques, this technique can be most effectively modified. This involves various kinds of structures each containing different deep layers of features or neurons or nodes. The present layer of design which imparts association with the past layer accepts its yield as the learning contribution to completing additional operation. In proposed model, pass input sizes images 128 * 128 pixel followed by convolution layers using activation function with various window sizes followed by 3 × 3 max pooling layer. Convolution2D will decide the required kernel with different sizes, lengths, and widths, and padding will specify the use of an additional layer of zeros around the
466
A. Karnam et al.
Fig. 4 Process of building a machine learning model
image, if required. ReLU and softmax used as an activation function in this model help us to determine whether neuron should activate or not. To decrease the dimensional volume of the image in our model, max pooling will help us by getting the central component from rectified feature plan. Batch standardization is helping to normalize each layer’s activations by converting the inputs to mean 0 and unit variance. Then use the Flatten function to transform a 2D into a 1D. To remove overfitting, we use the Dropout feature to drop 50% neurons. In the classification stage, dense layers and softmax regression are used. With optimizer Adam, softmax activation function uses output neurons to create probabilities of each type.
5.3 Implementation and Results To implement mentioned process, we use windows operating system with 16 GB RAM, Python and all the experiment is done in Google Colab. The dataset of retinal fundus images is downloaded from online sources [10, 11]. Total 275 images are used for train and test DCNN model (70, 30), respectively. 175 images have glaucoma and 75 normal images. To classify glaucoma or none-glaucoma during preprocessing, Numpy and OpenCV packages have been used. As mentioned in proposed model, first phase comprises of input layer. Consider 128 × 128 pixels images as input and then convolution layers with various window sizes with activation feature and pool size 3 × 3 max pooling layers.
43 Deep Neural Networks Model to Detection Glaucoma …
467
5.4 Evaluation Figure 5 proves the consistency of the methods we suggest. The accuracy percentage obtained from the CNN model is 93.41%, and the accuracy derived for the validation is 90.96%. Figure 5 also demonstrates the relation between the number of epochs and the loss of the model, as we see that there is a loss in the model as we gradually increase the count in epochs. If we provide a rise in the number of epochs, then the neural network begins to overfit, which means that it has started to learn from static noise data and leads to a decrease in the accuracy of real time and to incorrect results. And if we reduce the number of epochs to less than 20, it begins to get underfit and produces precision of less than 80% based on the epochs used (Fig. 6). Fig. 5 Accuracy of model against train versus validation data
Fig. 6 Confusion matrix
468
A. Karnam et al.
The confusion matrix rows and columns are showing the order severity with respect to the mentioned classes like class 1—‘non-glaucoma’, class 2—‘mildglaucoma’, class 3—‘moderate’, class 4—‘severe’, and class 5—‘proliferative’, respectively). Or in confusion matrix, first row and first column, second row and second column, third row and third column, and fourth row and fourth column represent class-1, class-2, class-3, class-4, and class-5, respectively. Initially to check the accuracy of proposed model, 187 normal image (non-glaucoma) medical images are supplied, and out of them 173 medical images are correctly predicted as ‘no glaucoma’, i.e., 92.51% accuracy. Similarly, 37 ‘mild-glaucoma’, 100 ‘low-glaucoma’ medical images, 20 ‘severe-glaucoma’, and 30 proliferative-glaucoma images are supplied; out of them, 30 ‘mild-glaucoma’, 88 ‘low-glaucoma’ medical images, 16 ‘severe-glaucoma’, and 28 ‘proliferative-glaucoma’ images are correctly predicted, respectively, i.e., in terms of accuracy of model with respect to different class are 81.08%, 88%, 80%, and 93.33%, respectively.
6 Conclusion and Future Scope The confusion metric shows that the accuracy is achieved in order to test the implementation of our DCNN classifier with significant results. We used 1928 glaucoma detection images in which we obtained 93.41% training accuracy and 90.96% validation accuracy. In addition, automation assists in prediction, prevention, and early detection of the disease-related risks. We provided deep neural network architecture with the aim of identifying the glaucoma and also its severity levels. The future aim of our work is to improve precision by training models with more high-resolution images for each class, and if high computing power is given, then we can check with more layers in the CNN model and with different activation functions.
References 1. J. Phu, S.K. Khuu, A. Agar, I. Domadious, A. Ng, M. Kalloniatis, Visualizing the Consistency of clinical characteristics that distinguish healthy persons, glaucoma suspect patients, and manifest glaucoma patients. Ophthalmol. Glaucoma 3(4), 274–287 (2020) 2. G.A.K. Omodaka, K. Hashimoto, S. Tsuda, Y. Shiga, N. Takada, T. Kikawa, H. Yokota, M. Akiba, Glaucoma diagnosis with machine learning based on optical coherence tomography and color fundus images. J. Healthcare Eng. (2019) 3. Y.C. Tham, X. Li, T.Y. Wong, H.A. Quigley, T. Aung, C.Y. Cheng, Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis. Ophthalmology 121(11), 2081–90 (2014). https://doi.org/10.1016/j.ophtha.2014.05.013 (Epub 2014, PMID: 24974815) 4. A. Septiarini et al., Automated detection of retinal nerve fiber layer by texture-based analysis for glaucoma evaluation. Healthc. Inform. Res. 24(4), 335–345 (2018). https://doi.org/10.4258/ hir.2018.24.4.335
43 Deep Neural Networks Model to Detection Glaucoma …
469
5. R. Sharma, P. Sircas, et al., Automated Glaucoma detection using center slice of higher order statistics. J. Mech. Med. Biol. 19(01), 1940011. https://doi.org/10.1142/S0219519419400116 6. R. Zhao, X. Chen, L. Xiyao, C. Zailiang, F. Guo, S. Li, Direct cup-to-disc ratio estimation for glaucoma screening via semi-supervised learning. IEEE J. Biomed. Health Inform. 7. A. Septiarini, D.M. Khairina, A.H. Kridalaksana, H. Hamdani, Automatic glaucoma detection method applying a statistical approach to fundus images. Healthc Inform. Res. 24(1), 53–60 (2018). https://doi.org/10.4258/hir.2018.24.1.53 8. G. Pavithra, G. Anushree, T.C. Manjunath, D. Lamani, Glaucoma detection using IP techniques, in 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), Chennai (2017), pp. 3840–3843 9. N.A. Diptu et al., Early detection of glaucoma using fuzzy logic in Bangladesh context, in 2018 International Conference on Intelligent Systems (IS), Funchal - Madeira, Portugal, pp. 87–93 (2018) 10. Retinal fundus images for glaucoma analysis: the RIGA dataset, University of Michigan—Deep Blue Data. https://doi.org/10.7302/Z23R0R2 11. A. Budai, R. Bock, A. Maier, J. Hornegger, G. Michelson, Robust vessel segmentation in fundus images. Int. J. Bio-med. Imag. (2013)
Chapter 44
Time Series Analysis Using LSTM for Elderly Care Application Chagantipati Akarsh, Sagi Harshad Varma, and P. Venkateswara Rao
1 Introduction The aging of the population and their increasing will to live independently are bringing about several models for elderly care. The prevailing methodologies which use sensors are precise and accurate; however, they are both expensive and intrusive, also most of the people do not need this level of highly accurate data. The NILM and energy monitoring help the elderly and their caregivers a more cost-efficient and non-intrusive way of checking the well-being of the elderly. Smart energy meters not only help us monitor the energy of the house, it also helps us detect phasewise energy usage and device data like energy consumption. Using this data, we will detect different usage patterns within the house which may help us detect what sort of activity goes on at a specific period of your time. Assuming the elderly follow a daily routine like sleeping patterns, washing clothes, and other household chores, we will identify various patterns. Analyzing these patterns, concerning activities of lifestyle can help us observe if any significant changes in pattern are detected which may be identified as abnormalities. We will alert the caregivers or relatives about the abnormalities in order that they can take necessary measures.
C. Akarsh · P. Venkateswara Rao VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India S. H. Varma (B) Department of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_46
471
472
C. Akarsh et al.
2 Related Works Hernández et al. [1] presented a similar work with respect to non-intrusive load monitoring methods by reviewing the current research and NILM and various its applications and to develop an interest in this technology. Clement et al. [2] presented a similar work which uses semi-Markov model (SMM) that is used to teach and stumble on singular conduct via way of means of reading the SMM to locate precise systems representing conduct and an impulse-based technique that helps in the detection of activities of each day livings (ADLs) which emphasizes on temporal evaluation of ADLs in parallel. Both techniques are primarily based totally on clever meter activities describing which domestic equipment changed into switched. José et al. [3] proposed the usage of energy consumption values provided by the smart energy meters to increase possible healthcare solutions for houses and additionally makes use of the Dempster–Shafer principle to offer a score by comparing the electricity consumption values to the routine consumption values. Ciancetta et al. [4] proposed a device for extracting the electricity consumption, and the usage of convolutional neural networks lets in simultaneous detection and type of activities while not having to carry out double processing. Himeur et al. [5] proposed a unique anomaly visualization method which enables us apprehend abnormalities withinside the conduct, and so as to validate the proposed device, a new electricity intake dataset at equipment degree is likewise designed via a size marketing campaign executed at Qatar University Energy Lab, namely Qatar University dataset. Cui et al. [6] proposed a technique which identifies anomalous activities in homes by detecting any abnormalities in electricity intake and implemented five models on the electricity consumption dataset of different commercial and residential complexes to detect any anomalies in the consumption. Furthermore, they proposed a new version which is a hybrid of polynomial regression and Gaussian distribution. Tokgöz et al. [7] presented a deep learning architecture which combines RNN, LSTM, GRU and is primarily based on electricity consumption forecasting experiments on Turkish power load prediction. González et al. [8] proposed a new useful forecasting technique that attempts to generalize the usual seasonal ARMAX time collection version to the L 2 Hilbert space. The shape of the proposed version is a linear regression in which parameters perform operations on variables. Chandramitasari et al. [9] proposed a deep learning architecture with combination of LSTM and FFNN to carry out the power forecasting, and the proposed model was then applied to analyze the time series data of a commercial company. Kim et al. [10] proposed a technique which makes use of LSTM architecture which takes an input of previous energy consumption values as input to predict the consumption values of the next month.
44 Time Series Analysis Using LSTM for Elderly …
473
Fig. 1 Flow diagram of proposed system
3 Proposed System 3.1 System Architecture In the proposed method, initially, the raw data was extracted from the smart energy meter and was preprocessed by extracting the required features, changing the time zone from GMT to IST, finding the active phase and to obtain other latent features to identify weekday/weekend, etc. The preprocessed data was then given as input to the LSTM model to train the data and predict the electricity consumption values of the next day and covariance-based method to detect the sleeping patterns (Fig. 1).
3.2 Data Collection For this experiment, three houses where elderly people live were identified, and the hourly energy consumption values of each house were recorded for 10 weeks. The dataset has been made by scraping the application programming interface (API) of the smart energy meters. The raw data consists of timestamp in milliseconds, phasewise energy values, active power, and reactive power.
474
C. Akarsh et al.
3.3 Preprocessing In this step, the raw data from the API is processed to model interpretable data. The timestamp that is generated in the raw dataset is in Greenwich Mean Time (GMT). To assimilate the timestamp, Python library pytz is used to convert the time zone to Indian Standard Time (IST). From the raw data, phasewise energy values and timestamp in milliseconds and other latent features to identify weekday/weekend, etc., were extracted. The energy values obtained from the smart meter were inconsistent and had high standard deviation. To mitigate this problem, we have analyzed the energy values, from which we have concluded that elderly uses a single bedroom for sleeping; hence, the phasewise energy values were considered instead of the overall energy consumption values, after which the phasewise energy consumption values are further classified into active phase and passive phase. The active phase is considered as the phase which the elderly use during the nighttime and heavy duty appliances such as air conditioners and geysers are used and the passive phase is considered as the phase which is used during the daytime. We have identified the active phase by comparing the peak value of all the three phases for each hour and which phase they correspond to and considered the phase which has the highest value for the greatest number of hours as the phase on which heavy duty appliances like air conditioners are run. Then consider the sum of the other two phases as a passive phase and created a new data frame. Doing this has reduced the noise in the data significantly.
3.4 Trend Prediction Long short-term memory (LSTM) LSTM is a type of RNN which performs well for time series as they are capable of storing information for an indiscriminate duration and they are resistant to noise, and they also take seasonal and habitual changes into account while predicting the time series. To detect the trends in electricity consumption, the preprocessed data was given as input to the LSTM; the features are then transformed by using MinMaxScaler, and the data was split into training data (80%) and testing data (20%). The model was trained using the layers described in Fig. 2; Adam optimizer was the optimization algorithm used, and finally the electricity consumption of the next day is predicted. Given that LSTMs perform on series data, it approaches that the addition of layers provides tiers of abstraction of enter observations through the years. In effect, chunking observations through the years or representing the hassle at one of a kind time scales, we have stacked the LSTM version with Leaky ReLU, dropouts, and dense layers anyplace needed.
44 Time Series Analysis Using LSTM for Elderly …
475
Fig. 2 Data collection distribution graph
3.5 Sleep Pattern Detection In this step, we detect the sleep patterns of elderly people using a statistical method known as covariance. Covariance: Covariance (P, Q) =
((P − µ)E(Q − v)) n
where P is a random variable, E(P) = µ is the mean of P. E(Y ) = v is the mean of Q. N = no. of items. To detect the sleep patterns, we assume that the elderly sleep wake up around the same time every day. The energy values are categorized in time frames of 5 h in the morning around user’s preferred waking up time and 5 h at night around user’s preferred sleeping time. In this approach, to find out the waking up time, the covariance in the hourly energy values is calculated by taking the energy values for every 15 min as input; when the sign of covariance for two consecutive hours changes, we consider this point as the time at which they wake up and similarly during the night time, this point can be considered as sleeping time.
4 Results The extracted features from the smart energy meters were the input to the LSTM model. In Figs. 3, 4 and 5, the test vs prediction values of long short-term memory for different houses precision, recall, accuracy, F1 score, and support were the metrics for
476
C. Akarsh et al.
Fig. 3 Description of multiple layers of LSTM
evaluation of the model. The results obtained for three different houses are displayed in Table 1.
4.1 Metrics Precision = Recall =
True Positive (True Positive + False Positive)
True Positive (True Positive + False Negative)
(True Positive + True Negative) Total (Precision ∗ Recall) F1 Score = 2 ∗ (Precision + Recall)
Accuracy =
The LSTM model performed well in precision, recall, accuracy, and F1 score in all the houses. The second house had the highest classification accuracy of 87.6% and the average classification accuracy was 86.7%.
44 Time Series Analysis Using LSTM for Elderly …
477
Fig. 4 Test versus prediction for different houses
The hourly energy values were the input to the algorithm for detecting sleep pattern. Figure 5 displays the waking up time and sleeping time for different houses. The most common time at which they wake up is between 7 and 7:30 a.m. and the time when they go to sleep is around 12 a.m.
5 Conclusion We have created an elderly care model to research the energy, power, and other usage data of households also as individual devices to form an approach for monitoring the daily routine and activities of elderly people and detect any deviations. Many health problems such as inactivity-related issues are directly associated with these deviations. making this approach a useful gizmo for caregivers and relatives. This paper proposes a deep learning model which uses the long short-term memory architecture to spot the various trends in electricity consumption and identify sleeping patterns. A classification accuracy of 86.7% is achieved through this model which shows its efficiency; hence, it can further will not be able to detect abnormalities accurately.
478
C. Akarsh et al.
Waking up tim e
S leeping time
Fig. 5 Sleeping patterns at different houses, x-axis: No. of days y-axis: represents the time in hours
Table 1 Results obtained using LSTM algorithm with support = 227 House no.
Precision
Recall
Accuracy
F1 score
1
0.940
0.917
0.872
0.929
2
0.916
0.944
0.876
0.930
3
0.909
0.923
0.854
0.916
Avg
0.921
0.928
0.867
0.925
6 Future Scope Future possibilities include abnormality detection and integration with a mobile application to alert caretakers or family members about abnormalities or major deviation in sleep patterns and other smart devices like beacons, motion sensor, wristband
44 Time Series Analysis Using LSTM for Elderly …
479
accelerometers, sleep sensors, etc., to help detect presence in each room, fall and sleeping patterns, using which the accuracy of the model can be improved.
References 1. Á. Hernández, A. Ruano, J. Ureña, M.G. Ruano, J.J. Garcia, Applications of applications of NILM techniques to energy management and assisted living. IFAC-PapersOnLine 52(11), 164–171 (2019). ISSN 2405-8963. https://doi.org/10.1016/j.ifacol.2019.09.135 2. J. Clement, J. Ploennigs, K. Kabitzsch, Detecting activities of daily living with smart meters, in Ambient Assisted Living. Advanced Technologies and Societal Change, ed. by R. Wichert, H. Klausing (Springer, Berlin, Heidelberg, 2014). https://doi.org/10.1007/978-3-642-379888_10 3. J.M. Alcalá, J. Ureña, Á. Hernández, D. Gualda, Assessing human activity in elderly people using non-intrusive load monitoring. Sensors (Basel, Switzerland) 17(2), 351 (2017). https:// doi.org/10.3390/s17020351 4. F. Ciancetta, G. Bucci, E. Fiorucci, S. Mari, A. Fioravanti, A new convolutional neural networkbased system for NILM applications. IEEE Trans. Instrum. Measur. 70, 1–12, Art no. 1501112 (2021). https://doi.org/10.1109/TIM.2020.3035193 5. Y. Himeur, A. Alsalemi, F. Bensaali et al., A novel approach for detecting anomalous energy consumption based on micro-moments and deep neural networks. Cogn Comput 12, 1381–1401 (2020). https://doi.org/10.1007/s12559-020-09764-y 6. W. Cui, H. Wang, A new anomaly detection system for school electricity consumption data. Information 8(4), 151 (2017). https://doi.org/10.3390/info8040151 7. A. Tokgöz, G. Ünal, A RNN based time series approach for forecasting Turkish electricity load, in 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4 (2018). https://doi.org/10.1109/SIU.2018.8404313 8. J.P. González, A.M.S. Muñoz San Roque, E.A. Pérez, Forecasting functional time series with a new Hilbertian ARMAX model: application to electricity price forecasting. IEEE Trans. Power Syst. 33(1), 545–556 (2018). https://doi.org/10.1109/TPWRS.2017.2700287 9. W. Chandramitasari, B. Kurniawan, S. Fujimura, Building deep neural network model for short term electricity consumption forecasting, in 2018 International Symposium on Advanced Intelligent Informatics (SAIN), pp. 43–48 (2018).https://doi.org/10.1109/SAIN.2018.8673340 10. N. Kim, M. Kim, J.K. Choi, LSTM based short-term electricity consumption forecast with daily load profile sequences, in 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE), pp. 136–137 (2018). https://doi.org/10.1109/GCCE.2018.8574484
Chapter 45
Brightness Contrast Using Convolution Neural Network Sagar Yeruva, Anvesh Guduri, Yedla Sai Sreshta, Thatta Sumathi, and Dedeepya Tatineni
1 Introduction Convolution neural networks (CNNs) are neural networks that are being used in image classification and recognition [1]. CNN has been the proposed solution used for classification of images. CNN could successfully identify objects, faces, and traffic signals along with powering vision in self-driving cars and robots. They are also used in smart grid applications as well. CNN takes the input and then processes the image which gives the output. A CNN consists of an input layer, hidden layers, and output layer [2]. The hidden layers generally contain convolutional layer, rectified linear unit (ReLU), and fully connected layer. CNN uses convolution to work which means extracting features from the images. These features are given as the input to the fully connected artificial neural networks. These features have then further been processed through the network. Forward propagation and backward propagation are then used to train the network. This process of working continues repeatedly until a well-defined neural network with feature detectors and trained weights is observed [3]. Our work description highlights the importance of brightening dark and unclear images. Since we need clear and brighter images for proper understanding of a viewed dark picture, we propose to make the dark and unclear images to brighter and clearer ones. It can be used in many real-life situations and can be applied in various conditions such as CC cameras so that we can find the required person, thing, etc., clearly. It can also be applied to understand images clearer for accuracy. We use technology such as deep learning and convolutional neural networks. The S. Yeruva · A. Guduri (B) · Y. S. Sreshta · T. Sumathi · D. Tatineni Department of Computer Science and Engineering, VNRVJIET, Bachupally, Hyderabad, Telangana 500090, India S. Yeruva e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_47
481
482
S. Yeruva et al.
work aims at transforming dark and blurred images into illuminated images by using deep learning and computer vision algorithms.
2 State of the Art In [4], Pal and King present “Image enhancement using smoothing with fuzzy sets” which is used for gray-tone enhancement. It is observed that two enhancement blocks and a smoothing block between them both of them, to reduce fuzziness is involved. The algorithm for the enhancement included extraction of fuzzy properties and application of fuzzy operator. Operations such as defocusing, averaging, and max–min rule have been involved for smoothing. But, in this method, defocusing may lead to loss of relevant information in some cases. There is also a chance for some edges to get more smoothed. In [5], Polesel et al. present “Image Enhancement via Adaptive Unsharp Masking” which is used for enhancing images. The proposed system uses the linear unsharpmasking technique. The adaptive filter controls the sharpening action based on which the input and low contrast details are been enhanced than the high contrast details. The presence of linear high-pass filter has been observed to make the system extremely noise-sensitive. It is observed that here is excess noise-amplification which makes it suffer. The high contrast areas are enhanced. The output image may contain some overshoot artifacts which are unpleasant. In [6], Liu et al. present “Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network” which is an enhancement generative adversarial network (UEGAN). It is observed that the proposed model is based on a single deep generative artificial neural network. This network embeds the attention and modulation mechanisms which help to capture richer local and global features. The model which is proposed is observed to be unsupervised and has limitations which are not evitable. Noise cannot be removed from the result which is generated with this method. In [7], Guo et al. present “A Pipeline Neural Network for Low-Light Image Enhancement” in which multiscale retinex and discrete wavelet transformation have been blended to get the better results. Low light image enhancement (LLIE_NET) and Gaussian convolution kernal have been used. Multiscale retinex and discrete wavelet transformation have been combined to get results. Bigger datasets will make variance of image reduce. In [8], Gu et al. present “Blind Super-Resolution With Iterative Kernel Correction” in which they used the concept of iterative kernal correlation (IKC) since the blur kernals which are used in real applications are unknown and complicated. Since isotropic kernals are slightly blur, they may not be useful to apply in real applications in the world. In [9], Liu et al. present “Image Super-Resolution via Attention based Back Projection Network” in which the attention-based back projection network (ABPN) is used
45 Brightness Contrast Using Convolution Neural Network
483
for image super-resolution. For further super-resolution, the back projection mechanism has been developed. To iteratively update high- and low-resolution features, enhanced back projection blocks are suggested. There is inefficient computation for practical applications, and there is difficulty in optimizing network. In [10], Sun and Chen present “Learned Image Downscaling for Upscaling using Content Adaptive Resampler” in which they used content adaptive resampling (CAR) to learn image downscaling considering the process of upscaling. The content adaptive resampling kernels generated by the resampler network were applied to highresolution images for generating pixels. It requires more time to perform the upscaling and downscaling operations. It requires high bandwidth. In [11], Islam et al. present “Underwater Image Super-Resolution using Deep Residual Multipliers” which uses a deep residual network-based generative model that can be used by autonomous underwater robots and underwater image superresolution. This model uses deep residual multipliers in this case for single image super-resolution (SISR) for underwater imagery. In this case, gradient propagation became harder. It could not be succeed in restoring the global content contrast and sharpness. In [12], Nui et al. present “Single Image Super-Resolution via a Holistic Attention Network” through which they observe that single image super-resolution is a very ill-posed, difficult, and challenging problem in which obtaining an output of high resolution from any one of its low-resolution versions is the main aim. New holistic attention network is proposed by this method. This network consists of a channel spatial attention module and a layer attention module. To preserve rich features in each layer, demonstrating channel attention has been effective. It suffers from artifacts which are unpleasant and blurring in visual quality. Some of the results with blur downscale degradation model are found to be out of shape in visual quality. In [13], Fu et al. present “Twice Mixing: A Rank Learning based Quality Assessment Approach for Underwater Image Enhancement” in which a random generation of two mixing ratios is done for generating training examples and corresponding rankings for underwater image enhancement. Siamese network has been used to learn the pairwise comparison with a margin-ranking loss. It is a challenging task which needs to comprehensively assess diverse distortions.
2.1 Summary of State of the Art The existing state-of-the-art methods attempt to enhance image using various enhancement techniques such as gray-tone enhancement, unsharp masking technique, single deep generative network, discrete wavelet transformation, and adaptive rescaling. All the above-listed methods use different approaches to enhance the images into clearer ones such as defocusing, usage of high-pass filters, multiscale retinex, isotropic kernels, and Siamese network. Many methods listed above were attempted to brighter the dark and unclear images but failed to be accurate and were complicated to perform and recommends higher computational resources.
484
S. Yeruva et al.
3 Design Methodology Every image is then transformed by I * alpha + beta, a linear transformation. These two transformations are linear transformations that could actually change the overall contrast and brightness of the pictures. Consider these two images, the left one is the low exposure image (Figs. 2, 3, and 4), and the second one is transformed (Figs. 2, 3, and 4) using alpha = 2.019 and beta = 1.876. These two parameters are learned through the model. As you can see, given certain alpha and beta parameters, it could really change an image. It is implemented in Keras. To train the model, simply edit training.py for appropriate epoch value and training data and use it to train.
3.1 Architectural Design Convolutional neural networks are also known as ConvNets. They were introduced by Yann LeCun in 1980. Convolution layer comes under feed-forward artificial neural networks. Convolution neural networks consist of mainly three layers. They are input layers, hidden layers, and output layers. Convolutional neural networks are special kinds of neural networks. These networks have been used on images. They were useful for extracting some of the features which were hidden from the images. These may or may not be visible to the human eye. They are used widely in many applications of computer vision like object recognition, object detection, object localization, object tracking, etc. CNN were used in application of deblurring an image. Auto-encoders use the convolutional neural networks effectively to solve this problem. Auto-encoders have been used to store the details of the input image taken into a format smaller and different than the image size. These stored details can then later be used to recreate either the same image or a different image based upon the input image. Auto-encoders are the fundamental concept behind image recreation as well as generating new images. So, they were used as well as in many general adversarial networks. An auto-encoder has three sections. They are encoder, hidden layer, and decoder. The encoder takes the input, processes it, extracts features, and then stores the data in the hidden layer. A decoder performs the opposite action of encoder. It takes the data from the hidden layer and then recreates an image using the same. The encoder only contains different kinds of convolutional layers which contain different number of filters. These layers have been useful for extracting features from the blurred input image and then further in transferring these features onto a hidden layer. The hidden layer consists of a convolution layer. The hidden layer input as the feature map or the output of the encoder. The data from the hidden layer is passed on to the decoder. Since the encoder involves convolution layers, it makes sense to deconvolve to get back an image like the input image. In this case, it is the deblurred image. In deconvolution of images, you take a kernel with weights, like a convolution layer, and multiply it with the
45 Brightness Contrast Using Convolution Neural Network
485
Fig. 1 Architectural design of machine learning model
Fig. 2 Result of implementation on an image Case-1
intensity of a single pixel from the feature map. This new matrix replaces the pixel in the feature map. The weights for the kernels of each layer are learnt during the process of training the overall model. Therefore, the decoder involves the layer called as deconvolution layer or convolution transpose layer. The convolution transpose layers in the decoder have similar parameters and attributes that were used by encoder [14]. The following architectural design is adopted for the implementation of image enhancement using CNN as shown in Fig. 1.
3.2 Phases of Model 3.2.1
Patch Extraction
Patch extraction is the first stage or phase of the proposed model. Some part of an image in pixels is called a patch [15]. Each patch has a mean which is removed from the value of each pixel. Based on these weights, patches are sorted. Those which
486
S. Yeruva et al.
Fig. 3 Result of implementation on an image Case-2
Fig. 4 Result of implementation on an image Case-3
contain weights of high level are kept by threshold which are useful to remove edges [16]. We convert grayscale or color into binary in threshold [17]
3.2.2
Nonlinear Mapping
It is second stage or phase of the proposed model. A convolution neural network consists of a nonlinearity layer in which there is an activation function that takes the feature map generated by convolution layer and creates its output as an activation function. An elementwise operation over the input volume is the activation function. Therefore, dimensions of the output and input are identical [18]. The patch extraction and nonlinear mapping use the three process.
Convolution Layer The main purpose of convolution layer is to extract the input features [19].
45 Brightness Contrast Using Convolution Neural Network
487
Activation Function In this work, we use an activation function called leaky rectified linear unit (ReLU).This function is the improved version of ReLU activation function. The gradient is 0 as per ReLU activation function for all the inputs which are less than zero, which could deactivate the neurons in that region, and which may also cause dying ReLU problem [20].
Reconstruction It is the final phase of model or architecture that is used for output generation. It consists of two processes namely convolution layer and activation function called as sigmoid function. Sigmoid function is mainly used because it exists between 0 and 1. In models which we must predict the probability of output, it is mainly used. In CNN, we apply final activation which is the sigmoid function [21].
4 Experimentation and Results 4.1 Dataset Used The dataset used consists of 1050 blurred images where each consists of three photographs together. The dataset has been created for validation of the blur-detection algorithm. The dataset also can be used for testing the image deblurring, but we cannot use it for visual comparison. The dataset has been created for the verification of blurriness when blur-detection algorithm was used. The dataset may be useful for testing the image deblurring but we use this dataset to train and test to produce an output of clear images unlike the input unclear, blur ones. It was implemented in Keras and TensorFlow. We use Google Colab environment for implementation.
4.2 Results and Graphs The model we proposed makes the blur and low light condition images into brighter enhanced images. It was represented in Figs. 2, 3, and 4. The left side image of each figure represents the blurred image, and right side represents the enhanced images.
488
S. Yeruva et al.
Fig. 5 Evaluation of experimentation through loss function
4.3 Evaluation of Results We use the loss function such as mean square error that helps us to evaluate the accuracy of the model. It is used in calculating the error of the model. During the optimization process, a loss function is used. It is used to measure the accuracy of the model. As number of epochs increases, the loss function decreases as shown in Fig. 5. In this graph, X-axis is observed as number of epochs and Y-axis is observed as loss value. During an epoch, loss function is calculated over each item in the dataset, and it is guaranteed to give the loss measures at each epoch. As mentioned in the graph, for 100 epochs, the minimum loss we have acquired is 0.004. The number of epochs is a parameter that determines the number of times that the learning algorithm will work through the entire training dataset. More understanding can be acquired by plotting value loss along with training loss. This uncertain loss of the model decreases along with training process as we see in graph.
5 Conclusion In our daily life situations, we come across the pictures captured which may be blur, unclear that results and causes a huge delay in decision making and sometimes it may also lead to wrong conclusions and justifications. Image quality enhancement is the important activity that could eliminate the problems. This paper is targeted to enhance the quality of the images using CNN methods. Using these methods, we could diagnose the dark and noised images via pixelwise using computer vision and deep learning methods. Thus, the project is attempted to transform dark and blurred images into illuminated images by using deep learning and computer vision algorithms which can help to grab the information from images. The experimentation and design methodology used in this method give good results in improving the
45 Brightness Contrast Using Convolution Neural Network
489
accuracy of the images and have shown good performance in the form of calculation of error of the model. This method can be used in many real-life situations and can be applied in various conditions such as CC cameras so that we can find the required person, thing, etc., clearly. It can also be applied to understand images clearer for accuracy. We use technology such as deep learning and convolutional neural networks and many more.
References 1. W. Rawat, Z. Wang, Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29(9), 2352–2449 (2017) 2. K. O’Shea, R. Nash, An introduction to convolutional neural networks. arXiv preprint arXiv: 1511.08458 (2015) 3. Y. H. Liu, Feature extraction and image recognition with convolutional neural networks. J. Phys. Conf. Ser. 1087(6) 4. S.K. Pal, R. King, Image enhancement using smoothing with fuzzy sets. IEEE Trans. Sys., Man, and Cyber. 11(7), 494–500 (1981) 5. A. Polesel, G. Ramponi, V. John Mathews, Image enhancement via adaptive unsharp masking. IEEE Trans. Image Process. 9(3), 505–510 (2000) 6. Z. Ni et al., Towards unsupervised deep image enhancement with generative adversarial network. IEEE Trans. Image Process. 29, 9140–9151 (2020) 7. Y. Guo et al., A pipeline neural network for low-light image enhancement. IEEE Access 7, 13737–13744 (2019) 8. Gu, J., et al. Blind super-resolution with iterative kernel correction, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019) 9. Z.-S. Liu, et al., Image super-resolution via attention based back projection networks, in 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) (IEEE, 2019) 10. W. Sun, Z. Chen, Learned image downscaling for upscaling using contentadaptive resampler. IEEE Trans. Image Process. 29, 4027–4040 (2020) 11. Md.J. Islam, et al., Underwater image super-resolution using deep residual multipliers, in 2020 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2020) 12. B. Niu, et al., Single image super-resolution via a holistic attention network, in European Conference on Computer Vision (Springer, Cham, 2020) 13. Z. Fu, et al., Twice mixing: a rank learning based quality assessment approach for underwater image enhancement. arXiv preprint arXiv:2102.00670 (2021) 14. Convolutional neural network—Wikipedia 15. M. Juvonen, Patch-based image representation and restoration (2017) 16. M.H. Alkinani, M.R. El-Sakka, Patch-based models and algorithms for image denoising: a comparative review between patch-based images denoising methods for additive noise reduction. EURASIP J. Image Video Process. 2017(1), 1–27 (2017) 17. R. Atmaja, M. Murti, J. Halomoan, F.Y. Suratman, An image processing method to convert RGB image into binary. Indonesian J. Electr. Eng. Comput. Sci. 3, 377 (2016). https://doi.org/ 10.11591/ijeecs.v3.i2.pp377-382 18. https://wiki.tum.de/display/lfdv/Layers+of+a+Convolutional+Neural+Network#:~:text=A% 20non%2Dlinearity%20layer%20in,activation%20map%20as%20its%20output 19. https://medium.com/analytics-vidhya/everything-you-need-to-know-about-regularizer-eb4 77b0c82ba 20. https://www.mygreatlearning.com/blog/relu-activation-functi 21. https://medium.com/analytics-vidhya/understanding-activation-functions-and-hidden-layersin-neural-networks-4fca2b980917
Chapter 46
E-commerce Clothing Review Analysis and Model Building G. Manikiran, S. Greeshma, P. Vishnu Teja, Y. Sreehari Rao, Tanvir H. Sardar, and Moksud Alam Mallik
1 Introduction With the development of the network, there is an increasing number of people choose to purchase online. Given the shortage of information online, customers are always struggling with issues such as size, quantity, colors, etc. Therefore, an overview of other customer reviews can help us get a quick impression of the products. Understanding customer sentiments will enhance the efficiency of online shopping and the companies can analyze the insights of the customer’s interest in their product. The best way to improve the customer’s experience is by listening to them. The customer’s feedbacks always comes in many different forms and languages. Manually reading all customer’s reviews simply wouldn’t be possible. The best way to solve this issue is to build a technology model to automatically analysis the customer’s feedbacks, and the retailer can easily work on improving the customer’s experience. Organizations are beginning to take Web-based listening as a way for understanding their clients, to additionally work on their items as well as administrations. As a piece of this, text investigation has turned into a functioning field of examination in computational semantics and natural language processing. Perhaps, the most famous issues in this field is text characterization, an undertaking which endeavors to arrange documents to at least one classes that might be done physically or computationally.
G. Manikiran · S. Greeshma · P. Vishnu Teja · Y. Sreehari Rao · T. H. Sardar Department of CSE (Data Science), Jain Group of Institutions, Bangalore, Karnataka, India M. A. Mallik (B) International Islamic University Malaysia, Kuala Lumpur, Malaysia e-mail: [email protected] VNR Vignana Jyothi Institute of Engineering & Technology, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_48
491
492
G. Manikiran et al.
Towards this context, in recent times, many has shown top interest in classifying opinions of explanations found in Web-based media, audit locales, and conversation gatherings. This assignment is known as sentiment analysis, a computational procedure that utilizes insights and natural language processing to distinguish and order sentiments communicated in a message, especially to decide the idea of mentality (positive, negative, or unbiased) of the essayist towards a point or an item. The retailer also can refine sales and marketing strategies or report the important issues that might not be addressed. To successfully bring about these targets, we attempted univariate, multivariate analysis, and text mining on dataset features, and we developed models. This article is classified as: Sect. 1 providing a brief introduction about this paper, Sect. 2 gives the literature survey of proposed schemes, Sect. 3 briefs about the materials and method, in Sect. 4 elaborates the results and conclusion, and the future scope of this paper is made in Sect. 5.
2 Literature Survey Reference [1] (Raj Kumar et al. 2019) Dataset for this paper was taken from Amazon, in which it has the reviews of laptops, cameras, mobiles, tablets, and televisions. After preprocessing is done to check and classify the reviews into positive or negative, they have applied various machine learning algorithms. At last, this paper says that utilizing AI procedures we get best outcomes to characterize the products reviews. Naïve Bayes got an accuracy of 98.17%, while for camera audits support vector machine got a accuracy of 93.54% [2] (Satuluri and Belwal 2018). They used Amazon customer-review data in this article. They have attempted aspect-level sentiment analysis. They have utilized two machine learning algorithms for classification, SVM classification and naïve Bayes classification algorithm and the result is compared based on precision, recall, and f1 measure. This paper gives us an insight that we can obtain more accuracy from naïve Bayes algorithm than support vector machine algorithm [3] (Sanjay Bhargav et al. 2019). In this paper, they have used opinion dataset and have implemented various machine learning algorithms using naïve Bayes algorithm and judgement mining algorithms on the basis of natural language processing. The content-based recommender states the coordination of characteristics from a specific client profiles in which interests and inclinations are put away with properties of content object. In the event that some morphological variation is found in the profile and the record, then at that point, a match is made and afterward, the document is considered as significant [1, 4]. This paper additionally proposed a structure which contained information collection, preprocessing, and feature extraction, attribute choice [5]. In this paper, they proposed a fuzzy item ontology mining calculation. In this, the items are investigated from a fine-grained level of online client surveys. The original calculation cannot just assist an organization with working on their items, yet additionally, it assists the clients with settling on better choices [6, 7]. In this paper, to sort out the issue with few marker comments, they proposed
46 E-commerce Clothing Review Analysis and Model Building
493
Fig. 1 Architecture diagram of proposed system
an evolutionary fuzzy deep belief networks with incremental rules (EFDBNI) algorithm dependent on fluffy arithmetic and genetic calculation. The outcomes says that EFDBNI calculation had a critical improvement over existing strategies. This technique has additionally accomplished great outcomes in feeling characterization issues with a couple of labelled comments.
3 Materials and Methods 3.1 System Architecture See Fig. 1.
3.2 Confusion Matrix Error matrix which is commonly called a confusion matrix is used for describing the performance of a classification model. The classification model includes true positives and negatives, false positives, and negatives. How confused the model is between the classes is shown by the confusion matrix (Fig. 2).
494
G. Manikiran et al.
Fig. 2 Confusion matrix
Fig. 3 AUC–ROC Curve
3.3 AUC-ROC Curve AUC—area under curve and ROC—receiver operating characteristics curve. For checking any classification model’s result, AOC–ROC curve is perhaps the main assessment metrics. Multiclass classification issue execution can be checked and visualized utilizing the AUC–ROC bend (Fig. 3). The curve says how much a model is equipped for separating between classes. The higher the AUC, the better the model is at predicting. At the point when the AUC is higher, better the model is separating among Yes and No. Defining the terms which are used in the above graph: TPR (True Positive Rate)/Recall/Sensitivity TPR = TP/(TP + FN) Specificity = TN/(TN + FP) FPR = 1 − Specificity = FP/(TN + FP) Relationships between sensitivity and specificity are inversely proportional to each other, whereas TPR and FPR are directly proportional to each other.
46 E-commerce Clothing Review Analysis and Model Building
495
4 Result and Conclusion In general, an AUC of 0.5 recommends no discrimination, 0.7–0.8 is considered adequate, 0.8–0.9 is considered fantastic, and more than 0.9 is considered extraordinary (Fig. 4; Table 1).
Fig. 4 ROC and AUC curve
496
G. Manikiran et al.
Table 1 Accuracy and FI score
S. No.
Algorithm
Accuracy
F1 Score
1
Bernoulli Naïve Bayes
0.909182
0.9489
2
Multinomial Naïve Bayes
0.928355
0.9596
3
Logistic Regression
0.924318
0.9573
4
XGBoost
0.899764
0.9455
5
Decision Tree
0.862428
0.9220
6
K-Means
0.875714
0.9328
7
Random Forest
0.888664
0.9402
8
SVM
0.920282
0.9555
9
Neural Network
0.922805
0.9562
5 Conclusions As the conclusion, we have drawn some inferences with the dataset using the visualized charts which may help the industry to increase their production and profit. Below are the inferences with considering the dataset as whole: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Age is not showing much impact in choice selection. General division contributes more reviews. Sleep class has the lowest purchases. Tops department has much following in the current trend. Dresses and knits class tops the list. Dresses class need urgent care. Longue class is most recommended much, it needs to be improved. Good reviews are more (almost 63%). “Dress” word is the most common word in both positive and negative reviews. Multinomial naïve Bayes model is best suitable for this analysis as it has good performance and accuracy.
6 Future Scope Analyzing the comments and reviews with the help of Python and statistics is a big task for a person who has zero knowledge in those technology, so it is hard to grasp and very difficult to adapt to this product which requires so much of knowledge in Python, Visualization, etc. So our future work will be dedicated to the business executives who may use this product as a backend with a supportive easy user interface which will helps them to visualize the data according to their understanding. This may be customized may be in near future which may depends on the requirement of the customers (Business Firms) and the domain that they are working as the interfaces and attributes differs with respective to the business place. If a business executive needs to expose their productivity across the world, he needs to be confident
46 E-commerce Clothing Review Analysis and Model Building
497
over the product that he is producing, so our product will help them without having any knowledge of Python or visualization coding or any tech-expert, he can just use the dataset at the dashboard of the application and draw the charts and inferences with very little efforts. Technology is built by experts and used by people. Creating a UI and deploying the model will make it easily available for the business executives, and thus, they can easily access the model and get instant conclusions.
References 1. D.M. Khan, T.A. Rao, F. Shahzad, The classification of customers’ sentiment using data mining approaches. Global Soc Sci Rev (GSSR) IV (IV): 198–212 (2019) 2. S. Huber, H. Wiemer, D. Schneider, S. Ihlenfeldt, Dmme: Data mining methodology for engineering applications—a holistic extension to the crisp-dm model, in 12th CIRP Conference on Intelligent Computation in Manufacturing Engineering, vol 79, pp 403–408 (2019) 3. P. Pandey, P. Muskan, N. Soni, Sentiment analysis on customer feedback data: Amazon product reviews, in 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), pp 320–322 (2019) 4. R.S. Jagdale, V.S. Shirsat, S.N. Deshmukh, Sentiment analysison product reviews using machine learning techniques. Cogn Inf Soft Comput Adv Intell Syst Comput 768, 639–647 (2019) 5. N.M. Ali, M.M.A.E. Hamid, A. Youssif, Sentiment analysis for movies reviews dataset using deep learning models. Int J Data Min Knowl Manage Process (IJDKP) 9(2/3), 19–27 (2019) 6. Q. Sun, J. Niu, Z. Yao, H. Yan, Exploring eWOM in online customer reviews: sentiment analysis at a fine-grained level. Eng Appl Artif Intell 81, 68–78 (2019) 7. P. Yang, D. Wang, X.-L. Du, M. Wang, Evolutionary dbn for the customers’ sentiment classification with incremental rules, in Industrial Conference Data Mining ICDM 2018: Advances in Data Mining. Applications and Theoretical Aspects (pp 119–134) (2018)
Chapter 47
Face Detection and Comparison Using Deep Learning R. Vijaya Saraswathi, D. N. Vasundhara, R. Vasavi, G. Laxmi Deepthi, and K. Jaya Jones
1 Introduction In this world with huge dark networks, the way for crime has become easier than ever in regarding security issues; network security has become one of the biggest concerns facing today’s countries’ security scrutiny. The algorithm used for face detection is not so more like human it works with computer intelligence Face recognition in computers is just like people store face data in a brain and recall visual data if needed but the computer should request data from a database [1] and match them to identify a human face. A computer which is equipped with a camera identifies a human face, extracts required facial features, then recognizes the face and tries to match it to faces stored in a database. Face recognition is used for two primary tasks: 1. 2.
Identification Verification.
As for smart surveillance, we use deep learning technology. Deep learning technology is able to learn without any supervision of humans, which can draw data from both unlabeled and unstructured. Deep learning is also a form of machine learning, which can be used to detect fraud and suspicious activity, among several other things.
R. Vijaya Saraswathi · D. N. Vasundhara · R. Vasavi (B) · G. Laxmi Deepthi · K. Jaya Jones Department of CSE, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_49
499
500
R. Vijaya Saraswathi et al.
1.1 Deep Learning Technology Deep learning is a function of artificial intelligence which is similar to the working of the human brain in processing data for object detection, translation and recognizing speech [2–4]. It is able to learn without any supervision of humans, which can draw data from both unlabeled and unstructured. Deep learning [5] is also a form of machine learning, which can be used to detect fraud and suspicious activity, among several other things.
1.2 Face Recognition Technology The security systems in public places and huge crowded areas like airports, malls and especially in the police forces will depend on these advanced options that are based on advanced computer programming. These programs must verify people’s presence and also their thefts. This system we use to recognize faces [6] is based on a database for pictures of people, criminals, thieves and few others are the pictures captured by a surveillance camera. A face recognition system is an application which is used for detecting a digital image of a person whose data is already stored and sorted according to the priority in the database. It works by comparing the digital image features from the face and according to the acquired database.
1.3 Face Metrics A human face has huge different features [7, 8] in their own means of the face. The program used for face measure has several parameters such as nodal points. Every face has approximately 80 different nodal points. All face measure program uses the relative position, size and shape. The most famous features of the face measured by a program are: 1. 2. 3.
Eyes distance Eye depth Structure of cheekbone and jaw angle.
1.4 Basic Operations 1.
Face Detection: It is the most important step in face recognition. If the face recognition program was successful, then we can see the rectangle boxes around the face as shown in Fig. 1.
47 Face Detection and Comparison Using Deep Learning
501
Fig. 1 Face detection
2.
3.
Feature Segmentation: It requires 3D head pose, facial expression, relighting and lots of other features because sometimes face detection with basic face features is not possible. Face Recognition: In the implementation point of view, the recognition has to be done after detecting and comparing the data sets in the database, if a match is found the face is recognized [9, 10].
2 Related Work There are heavy collections of face detection and recognition works. Reviewing everything is not possible in this paper so we will only briefly discuss the most relevant recent work. The approach we have used in this system is much congruent to recent works such as it learns all of its perspectives and presentations directly from the face. There are several works done which are related to face detection and recognition. In this, we explore two different deep network architectures that have been recently used to great success in the computer vision community. Both are deep convolutional networks [11, 12]. First, this architecture is based on the ZeilerFergus model which has multiple interleaved layers of deep convolution layers used for filtering the face features, response normalizations (used for shortlisting necessary features), activations, pooling layers, which are used after convolution layers. Now the second, this is the inception model architecture of “Szegedy et al.” It is also represented as a winning approach in facenet algorithm 2014. This approach used several mixed layers of convolution networks [13–15] and their corresponding several pooling layers so that this approach can reduce the number of parameters needed for face recognition by 20%, and it is also used to reduce errors in evaluating the face. There is an algorithm called histogram of oriented gradients (HOG) [16–18] it uses holistic methodology in face recognition. This algorithm is used several times to complete computer vision tasks such as object detection, expression detection, and facial signatures but, mostly it is used to extract integral image features from face. There is a methodology for face recognition [19–21] such as Gaussian face which was published in the year 2019. In this method, this algorithm is used for complex
502
R. Vijaya Saraswathi et al.
face detection like if the face has complexities in determining because of camera resolution, etc. It initially normalizes each picture into a 150*120 pixel image so that all images have the same configuration and uses face landmarks to detect the perfect face such as eyes, corners of mouth and the nose and then it creates 25 * 25 overlapping tiles of vectors in the image; then, this patch network vector values are used for face recognition. There is another approach called facenet which uses eleven convolution layers and three face configuration (FC) layers [22–24]. They have used around 200 M facial datasets and 8 M identifications to train its convolution neural network. There is rapid loss in datasets [25, 26] so they used triplet optimality function instead of using more convolution layers. With this, they have achieved some state of the art performance for face recognition.
3 Proposed System Now, we propose the system which is used to 1. 2. 3. 4.
Train the face Detect the face Verify the face Compare the face.
3.1 Face Recognition Methodologies The face recognition system is a program capable of matching a face, extracting a digital image from a video frame [27, 28] and compares the acquired faces with a database of faces. These methodologies are used to ensure that we get authentic predictions for recognized faces. Few methodologies for recognizing the face. 1. 2. 3. 4.
Holistic matching Model based Hybrid methods Feature based (Structural).
Holistic matching is used if the data is complex and hard to detect the face in the frame; it uses 3D convolutional techniques to use less layers for removing marginal error. Model-based methodology is used to find the face which are in an accurate axis (used for face identification algorithms). Hybrid method is combination of holistic and model based. We have used feature-based methodology for the system.
47 Face Detection and Comparison Using Deep Learning
503
Fig. 2 Feature-based methodology
Feature Based (structural) The feature-based approach used for face recognition and detection start without having any prior knowledge on its structure, so in the beginning, it mainly depends on the biologically known features such as detecting the organs in the face such as eyes, nose, lips and forehead; after detecting, we can see the rectangle boxes around the feature as shown in Fig. 2. In the next step, it starts to extract digital integral image features such as haar-like features and classifies them using AdaBoost algorithm, which is used to train the single image for detection. After detection of images and removing the noise (unwanted data). The extracted features were used for classification of image. The experiments made to 128 * 128 size images were successful with less noise.
3.2 Viola Jones Algorithm This algorithm uses feature-based methodology, and it was created in the year 2001 by Paul Viola and Michael Jones; it has been almost 20 years since this algorithm was created but still so many approaches in image classification use the same algorithm
504
R. Vijaya Saraswathi et al.
because it is fast in execution and robust in nature. It is used for image classification using digital cascades; it has its own training algorithm to train the face called AdaBoost algorithm, as it trains only single image as final image. Now for detecting the face, first it turns the image into grayscale, then extracts the haar-like features (which has the values of dark and soft region in the form of 2D array) and then it compares with the database.
3.3 Proposed Algorithm Prerequisite terminologies we need to know for understanding this training and detection algorithms. Haar-like features. These used to be features of the face in grayscale as shown in Fig. 3. There are different kinds of haar-like features as shown in Fig. 4 Fig. 3 Haar-like features on face
Fig. 4 Types in haar-like Features
47 Face Detection and Comparison Using Deep Learning
505
Fig. 5 Values for haar-like features
As we see in the Fig. 4, there are edge features (1 and 2), line features (3) and hybrid features (4). They are white and black pixels images (value 0 or value but usually we have a grayscale/color image (pixel value ranges from 0 to 255). Here Fig. 5 shows an ideal feature of black and white and the left part of the image shows the haar-like feature of nose in the face and those are the digital integral features of the nose of that image which shows the rating of darkness on the scale of 0 to 1, if it is complete white then it is 0 and complete black then it is 1 and the values in the above boxes are somewhere in between zero to one in grayscale in the same way we can calculate integral features of all the other features of faces such as lips, eyebrows, eyes and skin graft we can calculated more than 145 features from a single face. Convolution layer. A convolution is the simple application of a filter to an input that results in activation. The result is highly specific features that can be detected anywhere on input images. Max-pooling. Max-pooling is a pooling operation that selects the maximum element from the region of the feature map covered by the filter. Data Collection and Processing S. No
Steps for data collection and preprocessing
1
Create two folders such as test and train. These folders represent test and train datasets, respectively, in which we store the face datasets
2
Train dataset is a dataset of faces used during the learning process and is used to fit the parameters (continued)
506
R. Vijaya Saraswathi et al.
Fig. 6 VGG16 architecture
(continued) S. No
Steps for data collection and preprocessing
3
Test dataset is the final model used to predict classifications of examples in the test set which contains faces. Those predictions are compared to the example’s true classifications to assess the model’s accuracy
4
Now, we preprocess the data so that we have to eliminate all the unwanted data. In our case, remove all the sections which have no face
5
Send the preprocessed data to test-train split
6
The train-test split procedure is used to estimate the performance of vgg15 algorithms when we used to make predictions on image data not used to train the model
7
By using the step, we can remove all the faces which cannot be determined into digital image features so that algorithm works with enough efficiency
Training Data We use keras VGG16 algorithm which uses convolution neural networks [29–31] to train the face datasets and extract the required feature to render the model and compare it. Figure 6 shows the implementation flow of VGG16. The 16 in VGG16 refers to it having 16 layers that have weights. This network is a pretty large network and it has about 130–140 facial parameters [32] to check in each image. S. No. Steps to train the face 1
The ImageDataGenerator will automatically label all the data inside the criminal’s folder (it contains test and train data sets) so that data is easily ready to be passed to the neural network
2
Resize all the image datasets into 224 * 224 px
3
After preprocessing the facial datasets or folders, these folders are sent into VGG16 input layers, those are convolution layers as we discussed in the prerequisites, it is used to filter the integral face features
4
So we send all the data sets from the train folder as an input to convolution layer-1
5
Layer-1 is again subdivided into two layers such as conv1-1 and conv 1–2, Fig. 6, which are used to extract all the haar-like features of that faces from all datasets (faces)
6
From the detected haar–like features, we calculate the probability to check that whether the feature extracted is belongs to the face or not (continued)
47 Face Detection and Comparison Using Deep Learning
507
(continued) S. No. Steps to train the face 7
If the probability is greater than 0.74, then it records in an array as it is a particular hair-like feature of a particular face; it could be eyes, lips, nose…etc
8
Finally, the data is sent to max-pooling which is used to check what are the features that are already noted in the above layers so that in the upcoming layers, we need not to train that particular haar-like feature which is already extracted
9
Then it repeats the process for next 4 stages of convolution layers, and then finally, we have 5 sets of arrays with haar-like features
10
In the dense layers, it will check which of the values needed to be taken from the array for a particular haar-like feature to get the accurate results this will done by checking these values with validate datasets
11
It runs in a loop to get the best accurate result, and finally, we have a 2D array with values of a particular face
Detection and Comparison S. No. Steps for detection and comparison 1
As soon as the camera is stated, the image collected will be converted into grayscale
2
Start to crop the image into 224 * 224 px
3
Set the inclination axis to 60°–120°
4
Then we extract the haar-like features of the face
5
After finding the relevant haar-like features, change that into an integral image
6
After finding all the digital features required, then start creating the model for detecting the face
7
Now we are using the algorithm for the tilted faces also so that we need the angle of the forehead axis. According to that axis, this image can be rotated
8
Then compare all the features with the trained model
9
If the average probability of those features is greater than 0.75, then they are matched
10
Note down the accuracy of the person detected
4 Results and Discussion Figure 7 shows the accuracy of training the model by using Matplotlib library in Python. Blue line represents training accuracy and orange line represents validation accuracy. We can see that the accuracy of the model is increasing as there is an increase in no of faces (epoch). There is a slight congestion in validation accuracy because if we train more images there could be a marginal error that more than 2 haar-like features collide which leads to slight disturbance in validation accuracy which is negligible when compared to the difference between training accuracy and validation accuracy.
508
R. Vijaya Saraswathi et al.
Fig. 7 Accuracy in training and validating the data
Figure 8 shows the loss in training the model by using Matplotlib library in Python. Blue line represents loss in training, and orange line represents loss in validation. We can see that the loss of model is decreasing as there is increase in no of faces (epoch) because if we train more faces, the algorithms understand which part we need to focus perfectly. There is a slight congestion in validation loss. There is fluctuation in validation loss as we are using haar-like features there is a possibility of having more than 2 alike haar-like features for several faces so there could be slight glitch in validation but it is negligible when compared to overall loss, which is very less after training this algorithm for several times. Fig. 8 Loss in training and validating the data
47 Face Detection and Comparison Using Deep Learning
509
Table 1 For accuracy in detection S. No.
Name of the person
No of Successful detections
No of failure detections
Accuracy (%)
1
Purna
21
0
100
2
Suresh
19
2
90.4761
3
Madhu
22
2
91.6666
4
Santhosh
15
1
93.7500
5
Buchi
20
1
95.6821
6
Swaroop
23
2
92.000
7
Mark
9
1
90.000
8
Jeff
10
1
90.0909
9
Steve
21
2
91.3043
10
Trump
17
1
94.4444
We have experienced a minute distraction while detecting the faces but mostly the accuracy is similar to 92.940%, which is average accuracy that is shown in Table 1. Table 2 shows the confusion matrix for the classifier which is made up of 250 predictions such as • Actual yes shows that it is the actual face needed to be predicted. • Actual no shows that it is the face not to be recognized while other faces are running. • Predicted yes shows that these are faces which are predicted correctly. • Predicted no wrong face detected accordion to the algorithm. Accuracy: This shows in overall how often the classifier (detection) is correct. Error rate: This shows in overall how often the classifier (detection) is wrong. Precision: When it predicts yes, how often is it correct? Prevalence: How often does the no condition actually occur in our sample?. Table 3 shows the performance metrics like accuracy, error rate, precision and prevalence of the proposed model. Table 2 Confusion matrix or detection
Predicted yes
Predicted no
Actual yes
60(TP)
21(FN)
81
Actual no
9(FP)
170(TN)
179
69
191
250
510
R. Vijaya Saraswathi et al.
Table 3 Performance metrics S. No
Metric
TP
TN
Actual yes
Predicted yes
Total
Formulae
Result
1
Accuracy
60
170
81
69
250
(T P+T N ) T otal
0.92
2
Error rate
60
170
81
69
250
1 − Accuracy
0.08
3
Precision
60
170
81
69
250
0.86
4
Prevalence
60
170
81
69
250
TP Pr edictedY es ActualY es T otal
0.32
5 Conclusion In this paper, we propose the enhanced algorithm for Viola Jones as Viola Jones is generally used for image comparison of frontal straight 90 degree faces, and it has its own inbuilt algorithm to train the face data which is AdaBoost algorithm so in this enhanced algorithm, we have used convolution neural network algorithm called keras VGG16 to train the facial data. It is helpful to train tilted images also in the detection by adding few minute details such as axis and video capture. By this, we can detect the face through video in this enhanced algorithm.
6 Future Scope First and foremost, keras VGG16 is very time consuming to train a single face. Usually, Viola Jones is used for image classification because of that it leads to excessive logging, which leads to space complexity in disciplinary system. The abovementioned technique may be enough for any system to cope with the local challenges. However, a broad multidisciplinary approach is required to meet the endless demand from the users such as what if the person increases his beard and covers his face with a cap. We can add retinal scans also because a person cannot change his eyes.
References 1. R.V. Saraswathi, L.P. Sree, K. Anuradha, Support vector based regression model to detect Sybil attacks in WSN. Int J Adv Trends Comput Sci Eng 9 (3) (May–June 2020) 2. F. Schroff, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and clustering, in Proceeding of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (June 2015) 3. Y. Sun, X. Wang, X. Tang, DeepID3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873 4. G.B. Huang, E. Learned-Miller, Labeled faces in the wild: updates and new reporting procedures. Technical Report UM-CS-2014-003 (University of Massachusetts, Amherst, May 2014)
47 Face Detection and Comparison Using Deep Learning
511
5. G.B. Huang, M. Ramesh, T. Berg, E. Learned-Miller, Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49 (University of Massachusetts, Amherst, Oct 2007) 6. M. Lin, Q. Chen, S. Yan, Network in network. CoRR, abs/1312.4400 (2013) 7. D. Lee, H. Park, C.D. Yoo, Face alignment using cascade Gaussian process regression trees, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) 8. Y. Sun, X. Wang, X. Tang, Deeply learned face representations are sparse, selective, and robust. CoRR. abs/1412.1265 (2014) 9. Y. Taigman, M. Yang, M. Ranzato, L. Wolf, Deepface: Closing the gap to human-level performance in face verification, in IEEE Conference on CVPR (2014) 10. R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in CVPR (2014) 11. Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4), 541–551 (1989) 12. D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors. Nature (1986) 13. R. Girshick, F. Iandola, T. Darrell, J. Malik, Deformable part models are convolutional neural networks, in CVPR 14. B. Hariharan, P. Arbelaez, R. Girshick, J. Malik, Simultaneous detection and segmentation, in ECCV (2014) 15. B. Hariharan, P. Arbelaez, R. Girshick, J. Malik, Hypercolumns for object segmentation and fine-grained localization, in CVPR (2015) 16. . G. Ghinea, R. Kannan, S. Kannaiyan, Gradient—Orientation—based PCA subspace for novel face recognition (2014) 17. N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (June 2005) 18. P. Campadelli, R. Lanzarotti, C. Savazzi, A Feature-Based Face Recognition System [Online]. Available 19. M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks. CoRR. abs/1311.2901 (2013) 20. Z. Zhu, P. Luo, X. Wang, X. Tang, Recover canonical view faces in the wild with deep neural networks. CoRR. abs/1404.3543 (2014) 21. M.A. Turk, A.P. Pentland, Face recognition using Eigen faces, in Proceedings: 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 91 (IEEE, 1991) 22. K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask R-CNN, in 2017 IEEE International Conference on ComputerVision (ICCV) (IEEE) 23. B. Hariharan, P. Arbelaez, R. Girshick, J. Malik, Object instance segmentation and fine-grained localization using hypercolumns. IEEE Trans. Pattern Anal. Machine Intell. 39 (4) 24. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521 25. P.N. Belhumeur, J.P. Hespanha, D. Kriegman, Eigenfaces versus fisherfaces: recognition using class specific linear projection, in IEEE Transactions on Pattern Analysis and Machine Intelligence (1997) 26. H. Cevikalp, M. Neamtu, M. Wilkes, A. Barkana, Discriminative common vectors for face recognition, in IEEE Transactions on Pattern Analysis and Machine Intelligence (2005) 27. H. Yu, J. Yang, A direct LDA algorithm for high-dimensional data—with application to face recognition. Pattern Recogn (2001) 28. S. Nalluri, R. Vijaya Saraswathi, S. Ramasubbareddy, K. Govinda, E. Swetha, Chronic heart disease prediction using data mining techniques, in Data Engineering and Communication Technology. Advances in Intelligent Systems and Computing ed. by K. Raju, R. Senkerik, S. Lanka, V. Rajagopal, vol 1079 (Springer, Singapore, 2020). https://doi.org/10.1007/978-98115-1097-7_76
512
R. Vijaya Saraswathi et al.
29. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions (2014) 30. R.V. Saraswathi, V. Bitla, P. Radhika, T.N. Kumar, Leaf disease detection and remedy suggestion using convolutional neural networks, in 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 788–794 (2021). https://doi.org/10.1109/ ICCMC51019.2021.9418013 31. M. Mounica, R. Vijaya Saraswathi, R. Vasavi, Detecting Sybil attack in wireless sensor networks using machine learning algorithms, in 2021 IOP Conference Series: Material Science Engineering, vol 1042, p 012029 32. V.S. Manvith, R.V. Saraswathi, R. Vasavi, A performance comparison of machine learning approaches on intrusion detection dataset, in 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), pp. 782–788 (2021). https://doi.org/10.1109/ICICV50876.2021.9388502
Chapter 48
Long-Term Temporal Land Analysis of Agricultural Land and Shifting Cultivation Detection Sejal Thakkar, Ved Suthar, Chirag Patel, Shree Sashikant Sharma, and Namra Patel
1 Introduction 1.1 Land Cover Different information regarding the agricultural land can be obtained by looking into the land cover changes of that land. Here, we are going to discuss different methods and results for detecting land cover changes. Furthermore, different aspects of land can be classified from the output of land cover changes such as shifting cultivation, agricultural land growth of specific area, desertification of agricultural land, and deforestation of forest for repurposing it to agricultural land. Recent studies have found that detecting such land cover changes can help making predictions and decisions [1].
S. Thakkar (B) · V. Suthar · N. Patel CE Department, Indus University, Ahmedabad, India e-mail: [email protected]; [email protected] V. Suthar e-mail: [email protected] N. Patel e-mail: [email protected] C. Patel Charusat University, Charusat, India e-mail: [email protected] S. Sashikant Sharma SAC ISRO, Ahmedabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_50
513
514
S. Thakkar et al.
1.2 Shifting Cultivation Shifting cultivation is a system used in agricultural that involves the removal of native fields and crops in some cases they burn the land vegetation; this is followed by the planting in the new crop land for the next few years. Then in a fallow period, the vegetation is sprouted; this cycle then repeats again. This is widely followed in the hilly regions of the country. The cycle can be over 7–20 years. Using multitemporal satellite imagery, the shifting cultivators are accurately viewed as forest planters and managers [2]. As per recent studies, it is predicted that this type of cultivation would continue upon 2090 [3]. Currently, around 1.8 million people rely on this technique in north eastern India [4]. The New Land Use Policy (NLUP) is also established to eradicate the means of shifting cultivation [5].
2 Literature Review There are several attempts on mapping and visualizing the shifting cultivation. Most of the attempts are done without validation as visiting the farm lands in past is impossible and one can get only few live shifting cultivations. Pulakesh Das et al. tried to take on this challenge by using different indices such as burn vegetation index, burn ration and relative difference of the burn ratio to make an algorithm which gave them 85% accuracy [6]. While these methods are good enough to map on huge area, but they have a specific purpose only, that is to find shifting cultivation and that too dependent upon the data they are using. As it is very specific problem, most of the people are trying to find dynamics over the spatial and temporal aspects [7]. Very few attempts have been made for detection, and so, it is rather a new challenge to overcome. While Riahtam et al. used LANDSAT and Resourcesat 2 to make classification using segmentation to assume shifting cultivation and visualized it over the East Garo hills of Meghalaya, India [8].
3 Challenges 3.1 Classes for Model There are two main problems here. Firstly, there is no base for classifying land coverage. One’s conceptualization can differ from another person [9]. While this can be solved by using the specific classes required for the problem. The second challenge arises, detecting land cover can have different number of classes. The foremost challenge is the number of classes that are needed to be taken into consideration for the study. For the purpose of vegetation mapping, we would need classes like forest, crop land, barren land, urban area, water bodies, slopes of mountain, etc. such
48 Long-Term Temporal Land Analysis of Agricultural Land …
515
factors which could help finding the crop land better. But increasing the number of classes directly means that more quantity of input is required. The data gathered by the satellite is already huge for such a small area and gathering more would make it unable to fit into our systems. So, we reduced the number of classes by taking the quality of data into consideration. For example, if the data does not contain many urban areas and sloppy mountains, then we could combine those classes into one but at the cost of some loss in accuracy.
3.2 Resolution of the Data The second challenge in detecting land cover is the spatial resolution of the data. The resolution available on data is very low and can be used for some specific purpose only. Suppose we made a successful prediction of the land cover but the prediction would be over a large area and the specific information about that area could not be obtained. So, obtaining fine details or making fine predictions over an area using the data is out of the scope. The only thing that could be done is look at the irregularities over temporal data to gather information.
3.3 Noise in the Data Another challenge here is about the irregularities in the gathered data due to sensor’s failure or due to natural circumstances. As satellites are also made by humans, the parts made can fail to generate desired output. Such a case has happened with LANDSAT-7 where the data has missing strips. This was caused on the 31st May 2003 as the failure of LANDSAT 7’s Scan Line Corrector (SLC). The SLC’s role is to compensate for the satellite’s forward movement as it orbits, and its failure resulted into zigzag ground tracks instead of mapping in straight lines [10]. Aside from that due to natural circumstances, the data could have noise in it. Things like weather, cloud cover, mountain shadows could also make the data inadequate for model training [11].
4 Shifting Cultivation 4.1 Introduction Shifting cultivation is a form of cultivation where the land is used for agriculture a few times until the soil has fertility and then it is left abandoned or cut down or burned down to regenerate the soil fertility from the remains. It is generally practiced
516
S. Thakkar et al.
by an individual but sometimes it is also seen to be practiced by the whole village. This is typically called jhumming in India [12]. It is also known as slash-and-burn cultivation. By burning the land, the unwanted remains of the agriculture and weeds are burned to ash which can provide fertility to soil in the future. It also helps kill pests and pathogens present in the soil.
4.2 Effects of Shifting Cultivation The major disadvantage of this type of cultivation is wastage of land, once the fallow land is left that land cannot be used in anything. Generally, the forest area was converted into crop land, and later, it is involved in shifting cultivation which increases the soil infertility and leads to soil erosion. With narrowed periods for fallow, the time for forest regeneration is not enough. So, the new forest slowly replaces the older forest. Using slash-and-burn cultivation in the clearance of the fields, hinder the regrowth of the evergreen forest similar to the old forest which leads to catastrophic erosion of soil and reduced productivity [13].
4.3 Shifting Cultivation in India This type of cultivation is still in use in north eastern India. According to recent studies, a total of approximately 79% of the ‘forest cover’ in the district of West Garo Hills. Out of that, the land use map shows that 29% of the area in the district is either shifting cultivation field or actively cultivated or a wet rice farms where trees are rarely populated with huge crown cover [14]. Table 1 shows a few states of India. Table 1 Shifting cultivation in North Eastern India [17]
State
S.C. in 2005–6 (in km2 )
Andhra Pradesh
S.C. in 2008–9 (in km2 )
1025.07
961.04
Assam
160.15
258.86
Manipur
752.10
270.31
Meghalaya
291.87
272.52
Mizoram
1028.53
612.71
Nagaland
1239.09
1514.95
89.28
33.2-
Tripura
Source Wasteland’s atlas of India, 2011 [17]
48 Long-Term Temporal Land Analysis of Agricultural Land …
517
5 Method 5.1 Gathering Data The first step is to collect data from Google Earth Engine specifically from LANDSAT-7 (Source for the API to download the dataset: https://developers.goo gle.com/earth-engine/datasets/catalog/landsat-7). LANDSAT, a joint program of the National Aeronautics and Space Administration (NASA) and the United States Geological Survey (USGS). It has been observing the earth continuously from 1972 through the present day [15]. It has an adequate resolution of 30 m with a short interval of time so that data of mostly all possible scenarios can be gathered. From the dataset, only the data without the cloud cover is taken. As the complete clear sky to use the data for land application is only visible for 20.7% of the time for the whole world [16]. After gathering the data, the data should be preprocessed to a suitable, visualizable format so that anyone can visualize it. After processing, a model will be trained over it and from that model a software can be made which detects the area exhibiting shifting cultivation. Make an application which is usable by normal people, who can report recent activities of shifting cultivation (Fig. 1). The other goal is to study different methods which can be used to make a model and compare them to check in which cases different methods can be used to get most of that particular case.
Fig. 1 Methodology for detection and visualization of shifting cultivation Source Owner
518
S. Thakkar et al.
Fig. 2 Getting data from Google Earth Engine [1]
5.2 Patch-Based Analysis 5.2.1
The Data
We first collected raw images from LANDSAT 7 from regions containing from 25°N 89°E to 26°12 N 93°E with no cloud cover. Interval of 6 months from 2000 to 2019 was used to collect the data. The data is available to developers on Google Earth Engine. After downloading the data from Google Earth Engine, the raw data was extracted into tiff format. The data extracted was of 55.9 GB. From the data, RGB image was constructed using band 1, band 2, and band 3. IR image was constructed using band 4, band 5, and band 7. These images were then used for visualization purposes (Figs. 2, 3 and Table 2).
5.2.2
Preprocessing Done on the Data
After generating the RGB and IR image, they are divided into 100 × 100 pixel images with resolution of 30 m/pixel which is the highest available resolution from LANDSAT-7 sensors. Upon calculating, each image consists of 3 × 3 km of area. From these images by calculating the difference between images of the same area within 6 month of interval, we could check if there is an area with spots which could potentially mean that there was farmland which is not present after 6 months. Now from the raw images, the images are properly processed in a manner so that the model could be trained over it with adequate accuracy and minimal loss. Using the water index, the water bodies can be identified from the images. Using the NDVI value, we can get the intensity of vegetation in a particular area. For example, rice is the major crop grown in Meghalaya, Its NDVI is from 0.4 to 0.9 [17]. After
48 Long-Term Temporal Land Analysis of Agricultural Land …
519
Fig. 3 Sample RGB image produced by combining R, G, and B bands
Table 2 Bands used from the available dataset on Google Earth Engine [1]
Band
Description
Band 1
Blue
Band 2
Green
Band 3
Red
Band 7
Fake red for IR
Band 5
Fake green for IR
Band 4
Fake blue for IR
calculating such parameters, using mean value, we could reduce any bias in the data, and using the mean of differences in temporal data, the noise in data can be reduced.
5.3 AI Model 5.3.1
Shifting Cultivation Detection Using CNN
After collecting the data and processing the required output, a convolutional neural network (CNN) model is trained over it. The processed data is taken as the input and the change in vegetation cover is taken as the output for the model. The data on which it was trained consisted of around 250,000 images. To train over this many images, batch execution was done and was trained on GPU. The CNN model was trained on 3 km × 3 km patch of data. It detected the amount of change in the vegetation of the land between two images. Hence, this regression model consists of 2 pairs of convolutional 2D and max pooling 2D layers and then 2 dense layers with
520
S. Thakkar et al.
a dropout layer for countering overfitting. Adagrad with 0.0005 learning rate was used for 100 epochs per batch. 20 randomly generated 750 (500 normal, 250 with shifting cultivation) were used to generate the CNN model efficiently from the huge dataset. Python code of the CNN model (Flow explained above): #The sequential model from keras library mdl = krs.models.Sequential() #Convolutional 2D layer with 8 nodes mdl.add(krs.layers.Conv2D( 8,(3,3),input_shape=(100,100,1),padding='same')) #Max pooling layer to reduce the size mdl.add(krs.layers.MaxPool2D((2,2),padding='same')) #Convolutional 2D layer with 5 nodes mdl.add(krs.layers.Conv2D(5, (3,3),padding='same')) #Max pooling layer to reduce the size mdl.add(krs.layers.MaxPool2D((2,2),padding='same')) #Flattening image to 1D array mdl.add(krs.layers.Flatten()) #Dense layer with 64 nodes mdl.add(krs.layers.Dense(64, activation='relu')) #Dropout layer to reduce overfitting mdl.add(krs.layers.Dropout(0.3)) #Outplut layer with 1 class mdl.add(krs.layers.Dense(1,activation='relu')) #Compiling the model with optimizer and loss type mdl.compile(optimizer= krs.optimizers.Adagrad(lr=0.0005),loss= krs.losses.mean_squared_error) #Batch-wise processing batches=20 for batchn in range(batches): x,y=get_batch(500,250) mdl.fit(x,y,epochs=(batchn*100)+100,shuffle=True, batch_size=5,initial_epoch=batchn*100) Training accuracy: 89-85% Test set accuracy: 82-86%
48 Long-Term Temporal Land Analysis of Agricultural Land …
521
Fig. 4 Visualization of changes in vegetation overlaid onto RGB output image
Once the model is trained, we could predict it and map it back onto the RGB image from the LANDSAT-7. From our research, the average size of farms in Meghalaya is 2100 m2 . So, when a single farm is shifted, there would be an average change of this much area. From the predicted value of area changed by the CNN model, we could visualize different areas (Fig. 4).
5.3.2
Land Cover Detection Using DNN
The above method uses CNN, and hence, pixel-based analysis cannot be done over it. Hence to do pixel-based analysis, we made another model which takes processed value from each pixel as input and outputs the type of land. We used the normalized difference vegetation index (NDVI) value and the burn index value to classify the land cover. The index values were used to generate a feature value for the deep neural network (DNN) model. The DNN model was made using keras sequential model. It consists of 5 dense layers. All the layer has relu activation function except the last layer having sigmoid for classification. We used an Adagrad optimizer with 0.0005 learning rate. Binary cross entropy was used for counting the loss in the model. The training was done on 5 random batches of 50,000 size (because it was a pixel-based approach, there were millions of pixels to train from) with 100 epochs for each batch for training the model. Python code for the DNN model (Flow explained above):
522
S. Thakkar et al.
#The sequential model from keras library mdl = krs.models.Sequential() #Adding the first hidden layer with 10 nodes mdl.add(krs.layers.Dense(10,input_shape=(2,),activation='relu')) #Second hidden layer with 50 nodes mdl.add(krs.layers.Dense(50,activation='relu')) #Third hidden layer with 50 nodes mdl.add(krs.layers.Dense(50,activation='relu')) #Fourth hidden layer with 10 nodes mdl.add(krs.layers.Dense(10,activation='relu')) #Last output layer with 5 output classes mdl.add(krs.layers.Dense(5,activation='sigmoid')) #Compiling the DNN model mdl del.compile(optimizer= krs.optimizers.Adagrad(lr=0.0005), loss= krs.losses.Binary_Crossentropy()) #Batch wise processing batches=5 for batch in range(batches): x,y=get_batch() mdl.fit(x,y,epochs=(batch*100)+100,shuffle=True, batch_size=50000,initial_epoch=batch*100)
Output classes • • • • •
Barren land Crop/vegetation Forest Burnt vegetation Other
Once we have got the classification of land cover. By providing multitemporal data, we could generate land cover change visualization and analyze it to detect different things over the patch of land over the period of time. Using this data, we made an algorithm which checks for burnt land after being used as cropland to detect shifting cultivation. We could also use this data to get other information like deforestation, desertification and other long-term changes over land. This information could be useful as the northern region of India contributes to approximately one fourth of the total forest cover of India [18] (Figs. 5 and 6).
48 Long-Term Temporal Land Analysis of Agricultural Land …
523
Fig. 5 Predicted vegetation output cover over the land using DNN
Fig. 6 Image used in classification of land cover (DNN)
5.4 Implementing GUI for Easy Usage The output from the land classification model could be used for multi temporal data. The land classification can easily be visualized using a simple line graph as shown
524
S. Thakkar et al.
Fig. 7 Multitemporal analysis of land cover changes using DNN
Table 3 Output classes of the DNN model for land classification which also corresponding to graph in Fig. 7
Color
Class
Red
Barren
Blue
Crop land
Green
Forest
Cyan
Burnt
Yellow
Other
in Fig. 7. Here, we could see that there was a bit of burnt land on 7th period on the time scale. The labels for Fig. 7 are shown in Table 3. The trained model could not be used by a regular user without strong programming knowledge so we also made a GUI android app prototype for simple usage. Also, an application could also support other features like reporting the area exhibiting shifting cultivation to further improve the accuracy of the model. Recent studies also show that we have reached the saturation point of doing analysis using traditional methods, and we should look out for new methods for remote sensing [19].
6 Conclusion Using CNN gives accurate results and is better for visualization but is limited to patch-based analysis. For more robust usage like pixel-based analysis using simple DNN, we made a simple land cover detection model using which we could use the data over different periods of time to visualize and analyze different aspects of land cover changes with time. From this output, we could make analysis on the required land cover changes.
48 Long-Term Temporal Land Analysis of Agricultural Land …
525
References 1. Landsat 7 Datasets in Earth Engine | Earth Engine Data Catalog [no date], https://developers. google.com/earth-engine/datasets/catalog/landsat-7 (Accessed on 28 Aug 2019) 2. E. Kerkho, E. Sharma, Debating shifting cultivation in the Eastern Himalayas: farmers’ innovations as lessons for policy, ICIMOD: Kathmandu, Nepal, 2006 3. A. Heinimann, O. Mertz, S. Frolking, A. Egelund Christensen, K. Hurni, F. Sedano, L. Parsons Chini, R. Sahajpal, M. Hansen, G. Hurtt, A global view of shifting cultivation: recent, current, and future extent. PLoS One, 12, e0184479 (2017) 4. P. Gong, L. Yu, C. Li, J. Wang, L. Liang, X. Li, L. Ji, Y. Bai, Y. Cheng, Z. Zhu, A new research paradigm for global land cover mapping. Ann. GIS 22, 87–102 (2016) 5. FSI. The State of Forest Report; Forest Survey of India (Ministry of Environment and Forest); Government of India: Dehradun, India, 2015 6. SWC-GOM. Shifting Cultivation. Available online: http://megsoil.gov.in/shifting_cul.html (Accessed on 28 Aug 2019) 7. NLUP. New Land Use Policy (NLUP). Available online: https://nlup.mizoram.gov.in/ (Accessed on 30 March 2019) 8. M.D. Behera, P. Tripathi, P. Das, S.K. Srivastava, P.S. Roy, C. Joshi, P.R. Behera, J. Deka, P. Kumar, M.L. Khan, Remote sensing based deforestation analysis in Mahanadi and Brahmaputra river basin in India since 1985. J. Environ. Manag, 206, 1192–1203 (2018) 9. Why do some Landsat 7 images have black stripes on them? Available online: https://www.pix alytics.com/landsat-quirks (Accessed on 30 Aug 2019) 10. L.Y. Ji, P. Gong, X.R. Geng, Y.C. Zhao, Improving the Accuracy of the Water Surface Cover Type in the 30 m FROM-GLC Product. Remote Sensing 7, 13507–13527 (2015). https://doi. org/10.3390/rs71013507 11. P. Gong, L. Yu, C. Li, J. Wang, L. Liang, X. Li, L. Ji, Y. Bai, Y. Cheng, Z. Zhu, A new research paradigm for global land cover mapping. Ann. GIS 2016, 22, 87–102. 12. M. González-Betancourt, Z.L. Mayorga-Ruíz, Normalized difference vegetation index for rice management in El Espinal. Colombia. Dyna 85(205), 47–56 (2018) 13. Jhumming, a traditional lifestyle than merely a cultivation method. Available online: www.ind iaenvironmentportal.org.in/. India Environment Portal. 2010-04-25. Retrieved 2014-05-06 14. S.A. Rahman, M.F. Rahman, T. Sunderland, Causes and consequences of shifting cultivation and its alternative in the hill tracts of eastern Bangladesh. Agroforest Syst 84, 141–155 (2012). https://doi.org/10.1007/s10457-011-9422-3 15. A.J. Kurien, S. Lele, H. Nagendra, Farms or forests? understanding and mapping shifting cultivation using the case study of west garo hills India. Land 8, 133 (2019). https://doi.org/10. 3390/land8090133 16. A.J. Comber, P.F. Fisher, R.A. Wadsworth, Land cover: to standardise or not to standardise? Comment on ‘Evolving standards in land cover characterization’ by Herold et al. J Land Use Sci., 2, 283–287 (2007) 17. D. Pulakesh, S. Mudi, M.D. Behera, S.K. Barik, D.R. Mishra, P.S. Roy, Automated mapping for long-term analysis of shifting cultivation in Northeast India. Remote Sensing, 13(6), 1066. https://doi.org/10.3390/rs13061066 18. C.C. Jakovac, L.P. Dutrieux, L. Siti, M. Peña-Claros, F. Bongers, Spatial and temporal dynamics of shifting cultivation in the middle-Amazonas river: expansion and intensification. PLoS ONE 12(7), e0181092 (2017). https://doi.org/10.1371/journal.pone.0181092 19. N.B. Riahtam, J.M. Nongkynrih, K.K, Sarma, P.L.N. Raju, A.R. Mishra, D. Lal, A.M. Kharsahnoh, D.J. Sahkhar, Assessment of shifting cultivation dynamics in East Garo Hills District, Meghalaya, India. IOP Conference Series: Earth and Environmental Science, vol. 169, 012104 (2018)
Chapter 49
Mitigation of COVID-19 by Means of Face Mask and Social Distancing Detection Using OpenCV and YOLOv2 G. Sahitya, C. Kaushik, Podduturi Ashish Reddy, and G. Sahith Reddy
1 Introduction The spread of COVID-19 has given rise to a new world we human had never foreseen. Ironically, socializing has become an obligation for the social beings. In December 2019, the spread of extreme respiratory disorder COVID 2 (SARS-CoV-2), a serious irresistible respiratory sickness, arose in Wuhan, China. Ever since its spread, it has contaminated 7,711 individuals and resulted in 170 passing’s in China before COVID was pronounced as worldwide pandemic, and was named by the World Health Organization as COVID-19 (COVID infection 2019). The virus spreads from one person to another in close contact. It transmits from affected person via tiny droplets exhaled. A research has also proven that affected people without any symptoms of COVID also transmit the disease. This property made the virus so contagious that it has transmitted quickly throughout the world, bringing massive casualties, economic disturbances, and social challenges to humanity. Viola et al. [1] proposed the issue to be addressed is location of faces in a picture. A human can do this easily, yet a PC needs exact instructions and constraints. To make the undertaking more reasonable, Viola–Jones requires full view front facing upstanding countenances. Thus, to be detected, the whole face should point toward the camera and ought not be shifted to one or the other side. While it appears, these constraints could diminish the calculation’s utility fairly, as the location step is regularly trailed by an acknowledgment venture; in practice, these limits are quite acceptable. In videos of moving objects, one can utilize tracking calculations like the KLT calculation to identify notable highlights within the location bounding boxes and track their development between frames.
G. Sahitya · C. Kaushik · P. Ashish Reddy (B) · G. Sahith Reddy Department of ECE, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, Telangana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_51
527
528
G. Sahitya et al.
Chen et al. [2] in this paper employed YOLOv2 detector for the detection of pedestrians. Then they a modified and adjusted the weights and parameters of the network, making it more ideal for detecting pedestrians. After testing, their model had fairly high detection speed compared to state-of-the-art pedestrian model. Piccardi et al. [3]: One of the more widely and commonly used technique used for detecting moving people is “background subtraction.” It uses a simple idea of subtracting the reference frame from the current frame currently analyzing by the processor. It is often called “background image” and is the image of the area in the range of camera without containing any moving objects, including humans. Subtraction of the two images helps in identifying the moving objects, and in our case humans, from the environment and also separating each human. Dollár et al. [4] in his paper construct feature pyramids by exploiting the deep convolution network’s pyramidal hierarchy and inherent multiscale, costing just a bit higher than the previous counterpart. A top-down architecture is developed for building high-level semantic feature maps at all scales. It is built using lateral connections. This architecture shows significant results compared to its generic feature counterparts.
2 Proposed Method The suggested system is built using an exchange learning way to deal with optimizing its performance with a deep learning algorithm and by using TensorFlow and OpenCV for computer vision to put individuals out in the open spots under surveillance using a camera and a Raspberry Pi 3b+ and to check whether people are with a mask or not. The main contributions of this system are three components: person identification, safe distance estimation between identified people, and face mask identification. If the distance between two individuals is under 6 m, a red colored box is displayed across them. This system also recognizes if the individual is wearing a mask or not. Finally, the alert is sent.
2.1 Description This system is primarily intended to detect people without masks or not following social distancing in public places. This system is implemented using neural networks and OpenCV and hardware like Raspberry Pi, etc. As the hardware is minimum, the system is portable and easy to handle. It checks for people in the frame and identifies the faces and the distance between people if the number is multiple. If the distance is more than 6 m or if a person is found without a mask, an alert is sent to the officials. People in the frame would be covered by a box and the box’s color will represent if the person is safe or not (green-safe, red-not safe).
49 Mitigation of COVID-19 by Means of Face Mask and Social …
529
Fig. 1 Block diagram
2.2 OpenCV OpenCV is the huge and widely known open-source library for image detection and processing and has been growing ever since its start. Its easy-to-use nature and wider availability make it the most use library among its counterparts. OpenCV allows us to process images and videos to recognize objects, faces, or even patterns human cannot detect. This library is highly compactible with Python and the numeric computation of complex arrays generated by OpenCV is taken care by NumPy. To identify the features of the image, we perform mathematical operations on these features using vector space. As the entire core things are written in optimized C/C++, we do not have to worry about using it in Python.
2.3 YOLO—You Only Look Once In YOLO, you feed an image as an input to deep neural network, and the algorithm gives the object and its coordinate in the image. Internally, the image is divided into several regions and tries to predict the probability for each separated region. Depending on the prediction probabilities, the regions are assigned weights. As it takes a look at the whole image at test time, it implicitly encodes contextual information about the objects in the frame, giving it an upper hand over classifierbased frameworks.
530
G. Sahitya et al.
2.4 DSFD: Dual Shot Face Detector Dual Face Shot Detector (DSFD) has been uprising into limelight in recent years surpassing all the well-known face detection [5] algorithms with its high accuracy. This free detection algorithm is open-source and is free to use. It addresses three key areas of facial detection, and they are as follows. (1) (2) (3)
Feature enhance module Progressive anchor loss Improved anchor matching.
2.5 Estimating The Distance Between Objects in An Image Estimating the Euclidean distance between objects [6] in an image starts by setting a reference object. It is necessary for the reference object to have the following important properties: Property #1: We should have a prior knowledge about the dimensions of the reference object. Property #2: It should take minimal effort to identify the reference object from the image. We will utilize a coin as our reference object having a width of 0.945 in. We will also ensure that our coin is consistently the farthest left item in our picture. Depending on location, we will recognize our reference object; therefore, we can always ensure that our coin is the furthest left object in the picture. Our objective in this image is to identify the coin and then utilize the dimensions of the coin to calculate the distance between the quarter all remaining objects shown in Fig. 2.
Fig. 2 Computing the distance between objects in an image with OpenCV
49 Mitigation of COVID-19 by Means of Face Mask and Social …
531
2.6 Working The process all starts by taking a frame either captured by a camera or from an existing video or photo. This image is processed in OpenCV for image processing and enhancement. Now, the image from OpenCV is taken as input by YOLO. YOLO is an opensource object detecting system. Its high speed and accuracy make it the ideal detection system to use for this project. YOLO takes the input image and tries to detect humans inside the image. The detected humans are indicated with rectangular borders. Next, the center of the rectangle indicating the humans in the image is used to calculate the distance between each human and compare this distance with their height to get a near accurate distance between them in real environment. Then we require to detect faces and find the person is wearing mask or not. But the previously used system (YOLO) is not efficient in detecting faces. So, we employee Dual Face Shot Detector (DSFD) for this task. We train the model with data set containing two categories of input, one having faces wearing masks and the other with faces not wearing masks. Once trained, we use the model and find the accuracy and efficiency of the model. Once the model has high accuracy, we use that model to detect the face in the rectangle indicating humans and try to estimate if that face is wearing a mask. Finally, the output is displayed at a corner of the screen which gives the number of people estimated to not wear a mask or not following social distancing [7, 8].
2.7 Circuit Design The circuit diagram in Fig. 3 consists of two components and an external power supply. The components are Raspberry Pi 3b+ which is the microcontroller, USB webcam of 4 MP resolution. The webcam is directly connected to the USB slot of Raspberry Pi. The power supply to the Raspberry Pi is given using a 5 V power cable. The whole circuit above can be placed on any mobile such as an RC Car, or a UAV, or a robot, etc. Here, Raspberry Pi is used instead of a dedicated computer because we do not need the additional functionality provided by the computer, and the help provided by Raspberry Pi is sufficient for this project. Also, using Raspberry Pi is cost efficient and easy to use.
3 Results The live camera output has face mask detection as well as social distancing notification. For face mask, when a person has his mask on there will a green box representing
532
G. Sahitya et al.
Fig. 3 Circuit design
that the person is wearing mask. When person is detected not wearing a mask, then the box is labeled “NO MASK” with a red outline. Mask alert will also have the percentage of mask worn. This representation is helpful to determine the mask status exactly. For social distancing, the outer box will be a green outline as shown in Fig. 4.
Fig. 4 Screenshot of live camera output with mask ON (left) and OFF (right)
49 Mitigation of COVID-19 by Means of Face Mask and Social …
533
Fig. 5 Screenshot of recorded video output
3.1 Recorded Video Output Like the live camera output, who are following social distancing and face masks norms, there will be green boxes. And they will be red boxes as soon as the people in the screen violate social distancing. The video will have counters on the top of the video. It shows “total people” with how many are safe and unsafe depending on the no. of people violating social distancing norms as shown in Fig. 5. In very huge rooms, a camera that is located at the top corner cannot scan for masks on people’s faces. So, in rooms like these, there will be lots of people, and it is important to monitor social distancing between people.
4 Conclusion We suggested a system that utilizes OpenCV and MobileNetV2 design to help keep a safe climate and assuring individual protection by continually observing dense public places which could help reduce the spread of virus and consequently minimizing man force required for this repetitive work physically, hence providing an opportunity for them to utilize this time for other aid. Also, this system can reduce cops from exposing to high populated area. The job of surveillance would become much easier and faster, helping in reducing the transmission of virus. This system has the scope of saving numerous people from COVID transmission. Its robust and easy-to-use property makes it even more of a better system than its counterpart, hence saving time and effort.
References 1. P. Viola, M. Jones, Fast object detection using an enhanced cascade of simple features, in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 (Kauai, HI, USA, 2001), p. I
534
G. Sahitya et al.
2. H. Liu, Z. Chen, Z. Li, W. Hu, An efficient pedestrian detection method based on YOLOv2. Math. Eng. Issues, 1–10 (2018) 3. M. Piccardi, Background subtraction techniques: an analysis, in 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), vol. 4 (The Hague, 2004), pp. 3099–3104 4. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Type pyramid networks for object detection, in IEEE Conference Proceedings on Computer Vision and Pattern Recognition (2017), pp. 2117–2125 5. S.S. Mohamed, N.M. Tahir, R. Adnan, Background modelling and background subtraction efficiency for object detection, in 6th International Colloquium on Signal Processing and its Applications (2010) 6. A. Rosebrock, Measuring distance between objects in an image with OpenCV 7. R.J. Glass, L.M. Glass, W.E. Beyeler, H.J. Min, Targeted social distancing architecture for pandemic influenza, in Emerging Infectious Diseases (2006), pp. 1671–1681 8. L. Matrajt, T. Leung, Evaluating the efficacy of social distancing strategies to postpone or flatten the curve of coronavirus disease, in Emerging Infectious Disease (2020)
Chapter 50
Abstractive Text Summarization Using T5 Architecture G. S. Ramesh, Vamsi Manyam, Vijoosh Mandula, Pavan Myana, Sathvika Macha, and Suprith Reddy
1 Introduction As the internet is rising, we now have lots of digital data. We get information literally from many sources such as articles, news, social media, and e-mails. It will be easier to process information if we can automatically generate text summaries as new data comes in from various sources worldwide. Extracting these important portions from the original large text without losing vital information is called text summarization. The summary must be concise, fluent, continuous, and represent the significant parts of source data. The goal of producing a short and understandable summary while keeping vital information and overall meaning of text is known as automatic text summarization. Many algorithms for automatic text summarization have been developed in recent years and have been widely used in a variety of domains. Example: online search engines generate snippets as document previews [1]. Further examples include news websites that create shortened explanations of news subjects, usually in the form of headlines, to make browsing easier, or knowledge extraction methods [2]. There are two main ways of implementing text summarization: Extractive and abstractive. Extractive summarization focuses on retrieving important sentences from original data, whereas abstractive generates new sentences in summary by understanding the text. The goal of abstractive summarization approaches is to generate important information in a novel way. In other words, they use advanced natural language techniques to read and examine the text to develop a new and succinct text
G. S. Ramesh · Vamsi Manyam · Vijoosh Mandula · Pavan Myana · Sathvika Macha · Suprith Reddy (B) Department of CSE, VNRVJIET, Hyderabad, India G. S. Ramesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_52
535
536
G. S. Ramesh et al.
that delivers the vital information from the actual text given. When compared to automatic abstractive summaries, pure extractive summaries frequently produce better outcomes [3]. This is because abstractive summarization methods deal with more difficult problems like semantic representation, interpretation, and natural language production than data-driven approaches like sentence extraction. The most common methods for automatic text summarizing are discussed in the paper [4] by Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, and Krys Kochut. They have examined the various summarizing approaches and discuss their effectiveness and drawbacks. This paper proposes a model for summarizing text using T5 or Text-to-Text Transfer Transformer architecture. It is a transformer-based model that uses a textto- text approach. T5 uses an abstractive summarizing algorithm to generate new sentences from given text. With the T5 model, we have the ability to reframe all NLP tasks into a unified and generic text-to-text format where both the input and output are always strings.
2 Motivation Text summarization is useful to process large amount of data that is available online. Data is growing every day and most of it may contain redundant and irrelevant information. It may take us a lot of time and effort to go through the entire text. Summaries help us to retrieve relevant and important parts of original text. But summarization is an expensive and time-consuming process if it is done manually. If there is a system or someone who could summarize the data, we can focus only on data that is important to us instead of going through the entire data. This is possible through automatic text summarizers. Text summarizers extract useful information and produce shorter versions of long documents. They increase the readability and saves our time. It makes our reading process easier by reducing the scope of the information. Abstractive text summarization techniques select significant topics of original data, produce new sentences, and generate coherent summaries.
3 Literature Review Agus et al. [5] demonstrated text summarization using single document text summarization using the term frequency inverse document frequency also known as tf-IDF. A preprocessed list of words is found in this approach, and the TF-IDF value of each verb and noun is calculated. According to these values, sentences are ranked, and the highest-ranked sentences are put together as a summary. Kumar et al. [6] put forward a system using the text rank algorithm. This algorithm uses a weighted undirected graph to represent the sentences in an article. Then it makes use of Google Page Rank to order the sentences.
50 Abstractive Text Summarization Using T5 Architecture
537
Cheng and Lapata [7] modeled extractive summarization as a sequence labeling problem with an encoder–decoder architecture. They use a CNNLSTM encoder to transform word embeddings into a document representation with hidden layers representing the sentences. The model they use involves a neural network-based hierarchical document reader or encoder and an attention-based content extractor. In their work, Dhakras and Shrivastava [8] used the Bag of Word embeddings LearnER (BOWLER) model, a neural network approach for extractive text summarization. They proposed an encoder–decoder model with three components: Sentence encoder, which encodes sentences into sentence vectors, document encoder to process the above sentence vectors using RNN, and summary generator, which classifies sentences based on their worthiness in summary and selects sentences for a summary based on their scores. In their work, Jonsson [9] used the transformer model with LSTM-based encoder– decoder for the task of abstractive summarization. ROUGE and human evaluators are used as evaluation metrics. In their comparison between transformer model and Seq2Seq, they found that transformers accomplished higher ROUGE score. They also found changing input text does not affect much on transformer accuracy. Out of all the classes evaluated transformer surpassed the Seq2Seq baseline. The transformer trains in a finite amount of time and space for larger input and output sequences. Egonmwan and Chali [10] illustrated a traditional and popular encoder–decoder paradigm with a slight enhancement on encoder part. His instinct is that the likelihood of correctly decoding a sequence depends greatly on the pattern and correctness of encoders. Therefore, he introduced a framework that firstly uses transformer to encode the source sequence and then a Seq2Seq model. He also found that the transformer and Seq2seq model can be enhanced slightly for making a rich encoded vector representation. Here, he tried evaluating his model on CNN/DailyMail and newsroom datasets. Vaswani [11] proposed a simple transformer-based attention model leaving out traditional recurrents and convolutional networks. Upon few translation tasks, this model is found to be at a higher level in standard and takes significantly less amount of time to train, and also, the transformers generalize pretty well for other tasks with abundant and scanty training data. The transformer can be trained significantly faster than traditional recurrent and convolutional layers and outperformed all previous models for the task of text translation.
4 Methodology The proposed system solely focuses on the abstractive-based summarization method. In this summarization process, we would like to fine tune T5 model on a summarization task using the datasets according to our requirements. Here, the dataset that we used for summarization is the News Summary Daily Mail dataset. This consists of huge news articles that are being used for text summarization.
538
G. S. Ramesh et al.
4.1 The Transformer: Model Architecture Most effective neural network models have encoder decoder structure. In this, the encoder takes input as sequence of symbols [x1 → xN] and maps it to sequence of continuous values z = [z1 → zN]. Given z, the decoder takes it as input and generates corresponding output sequence [y1 → yM]. At every step, the model automatically improves itself by using the previous generated symbols and consumes them as extra input while generating the next step. The transformer follows this overall architecture using special method called attention layers. They are fully connected for both the decoder and encoder [12]. Encoder: This layer consists of a stack of size (N = 6) similar layers. Every layer has a multihead attention mechanism with a fully connected feed-forward network. The encoder layer converts the available text to machine-encoded text (Fig. 1). Decoder: The component also consists of a stack of same number of similar layers (n = 6). A third sublayer is included in the decoder along with other two sublayers. They provide multihead attention to the result of encoder. A different attention sublayer is used in the decoder to prevent current positions from paying attention to further positions. Fig. 1 Encoder–decoder model
50 Abstractive Text Summarization Using T5 Architecture
539
Attention: The purpose of this function is to map a query and a list of values for the output, where the queries, output, values, and keys are the vectors [13]. This output can be computed as the summation of the values. The purpose corresponding key and corresponding is to calculate the weight of every value. The benefit of using the attention layer is that it facilitates the model from multiple subspaces to share information. The transformer model uses multihead attention function in the following manner: In the “encoder–decoder attention” layers, the query is passed from the preceding decoder layer and the output of encoder consists of memory keys and values. By this mechanism, we can allow every position in the decoder to attend overall positions in the given sequence. Every position in the encoder is used to attend all the positions in the preceding layer of encoder. In the same way, the attention layers present in the decoder allow every position in decoder to attend every other position present in the decoder layer. The following flowchart depicts the training phase of our model (Fig. 2). Here, we preprocess the data according to our requirements, remove unnecessary data, and prepare a model. Then we train the model by giving training parameters and training data. After the model achieves the required accuracy, the model is then saved onto the local system or drive. Then the testing is done to analyze the text between Fig. 2 Flowchart of training text summarizer model
Input Training data
Data pre-processing 1. Remove unnecessary fields 2.Using nlpaug Augmenter, apply Data augmentation to textual input 3.Store all cleaned data into new file
Initialize training parameters Split dataset into train, test and validation datasets
Load model architecture Train model with train dataset Save all checkpoints
Save model
540
G. S. Ramesh et al.
Table 1 Sample results using T5 model Results Generated text
Actual text
Delhi CN gets 2 death threats on official e-mail
Delhi CM Arvind Kejriwal receives two death threats
Public works approves construction of flyover costing 50cr near Raj Nagar extension
PHD approves flyover in Ghaziabad’s Raj Nagar extension
actual and predicted summary. Then the evaluation is done on the validation dataset to know the validation loss.
5 Discussion and Results The experimental output of the Text-To-Text Transfer Transformer (T5) method was compared with attention-based sequence to sequence-based methods. The running loss during the model training was 0.1844, and the evaluation loss considering the validation dataset is 1.72. According to the experimental results presented, Text-ToText Transfer Transformer (T5)-based abstractive text summarization outperformed the baseline attention-based seq2seq approach when using the test dataset. Sample prediction results of the test are presented in Table 1.
6 Evaluation The model evaluation has been calculated based on a metric called ROUGE, abbreviated for Recall-Oriented Understudy for Gisting Evaluation which determines the summary quality. ROUGE is a set of metrics rather than a single metric. The metrics compare the model generated summary against a reference summary. The basic metric is generalized as ROUGE-N [14], where N represents the n-gram that is used for evaluating. The ROUGE-1 implies the number of overlaps of unigrams between the generated and reference summary of the text. ROUGE-2 and ROUGE-3 imply bigrams and trigrams, respectively. Gambhir M. and Gupta V have described various text summarization methods and compared their ROUGE-1 scores [15]. ROUGE is a recall, precision-based measure used to calculate F1 score. Here, recall is known as the ratio between the total number of overlapping n-grams to the number of n-grams in the reference summary. Along with that, precision is calculated as the ratio between the number of overlapping n-grams to the number of n-grams in model generated summary. Recall =
number of overlapping n-grams number of n-grams in ref summary
50 Abstractive Text Summarization Using T5 Architecture
541
Table 2 ROUGE comparison table Model
ROUGE-1
S2S
23.86
ROUGE-2 9.86
ROUGE-L
S2SR
24.70
10.00
24.50
TextRank
13.5
4.55
11.46
T5
42.64
22.30
40.61
23.83
Values in bold indicate the Rouge scores of our approach i.e using T5 which are better than other techniques
Precision =
number of overlapping n-grams number of n-grams in model summary
The F1 score represents the final ROUGE value which is formulated as: ROUGE (F1 score) = 2 ∗
precision ∗ recall precision + recall
ROUGE-L: It is calculated based on the longest matching sequence that does not require consecutive matches but is based on sequence order of the words (Table 2).
6.1 Graphs The following graphs show comparison of all the ROUGE metrics which consist of ROUGE-1, ROUGE-2, and ROUGE-L for four text summarization models. These models are S2S, S2SR, TextRank, and our model T5. The graphs show that T5 model has highest ROUGE score with TextRank having minimum score (Figs. 3 and 4). Fig. 3 Curved graph of ROUGE metrics for existing models
542
G. S. Ramesh et al.
Fig. 4 Bar graph of ROUGE metrics for existing models
7 Conclusion The vast amount of data is available to us because of tremendous growth in usage of internet. Summarizing large quantity of information becomes a tedious task for humans. The problem can be resolved with the usage of new tools like automatic summarization in this new era of information. Automatic text summarization can be useful in various sectors like health, education, and communication platforms. Automated summarization plays a key role in research involving natural language processing (NLP). As the self-explanatory tool name implies, it can automatically create a summary of the available texts. Abstractive method of summarizing is analogous to human summaries. Currently, abstractive summarization as of now requires high computational power and requires abundant data and is difficult to recreate into the specific domains. A summary is a quick text that describes the key aspects of the original text and ignores the rest. Time saving is main use of summarization. Text summarization in an abstractive way found to be advantageous in information retrieval tasks in the field of computers. Searching time will be greatly reduced with the invent of such tools.
8 Future Scope Text summarization provides solutions to future problems mainly dealing with identification of relevant features, new features, feature optimization that includes linguistic and semantic related features, to recreate grammatically acceptable sentences. Usage of stemming, lemmatization, bag of words (BoW) for dataset preprocessing also increase the performance of the developed model. Text summarization helps in generating new customized datasets from documents that are legal to use and documents from other available sources. We have found that by eliminating sentences that give geographic information may mislead the content, because there might exist a correlation between the sentences. It also helps in generating the newspaper headlines,
50 Abstractive Text Summarization Using T5 Architecture
543
article summaries, or any journal summary. Text summarization plays a major role in creating a bio-data from the textual details of the person.
References 1. A. Turpin, Y. Tsegay, D. Hawking, H.E. Williams, Fast generation of result snippets in web search, in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, 2007), pp. 127–134 2. M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E.D. Trippe, J.B. Gutierrez, K. Kochut, A brief survey of text mining: classification, clustering and extraction techniques. ArXiv e-prints (2017). arXiv:1707.02919 3. H.P. Edmundson, New methods in automatic extracting. J. ACM (JACM) 16(2), 264–285 (1969) 4. M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E.D. Trippe, J.B. Gutierrez, K. Kochut, Text summarization techniques: a brief survey, in Proceedings of ArXiv (USA, 2017). arXiv:1707. 02268v3 5. H. Christian, M.P. Agus, D. Suhartono, Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech Comput. Math. Eng. Appl. 7(4), 285 (2016) 6. Tanwi, S. Ghosh, V. Kumar, Y.S. Jain, B. Avinash, Automatic text summarization using text rank. Int. Res. J. Eng. Technol. (IRJET) 7. J. Cheng, M. Lapata, Neural summarization by extracting sentences and words (2016). arXiv preprint arXiv:1603.07252 8. P. Dhakras, M. Shrivastava, BoWLer: a neural approach to extractive text summarization, in 32nd Pacific Asia Conference on Language, Information and Computation Hong Kong (2018), pp 1–3 9. F. Jonsson, Evaluation of the transformer model for abstractive text summarization, diva2:1368180 10. E. Egonmwan, Y. Chali, Transformer-based model for single documents neural summarization, in Proceedings of the 3rd Workshop on Neural Generation and Translation (WNGT 2019) 11. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in, 31st Conference on Neural Information Processing Systems (NIPS 2017). arXiv:1706.03762v5 12. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P.J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21 (2020). arXiv:1910.10683v3 [cs.LG] 13. T. Shi, Y. Keneshloo, N. Ramakrishnan, C.K. Reddy, Neural abstractive text summarization with sequence-to-sequence models. ACM Trans. Data Sci. 1(1), 35 (2020). Article 1. https:// doi.org/10.1145/3419106 14. C.-Y. Lin, ROUGE: a package for automatic evaluation of summaries, in Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004) (Association for Computational Linguistics, 2004), pp. 74–81 15. M. Gambhir, V. Gupta, Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47, 1–66 (2017). https://doi.org/10.1007/s10462-016-9475-9
Chapter 51
Heart Failure Prediction Using Classification Methods Oruganti Shashi Priya, Kanakala Srinivas, and Sagar Yeruva
1 Introduction HF disease is the number one cause of deaths around the world. The main effect of this disease is due to blockage in arteries. A heart attack can occur without a person being aware of it. Heart attack is not always as obvious, and we find common symptoms like pain in arms and chest, shortness of breath, cold sweats, fatigue, swollen legs, and rapid heartbeat. A silent heart attack is one that has no symptoms, minimal, or unrecognized symptoms. High blood pressure, high cholesterol, diabetes, smoking, a family history of heart disease, obesity, and aging are all risk factors for silent attacks. The majority of cardiovascular diseases can be prevented by addressing risk factors like cigarette use, unhealthy diet, obesity, physical inactivity, and excessive alcohol consumption. It cannot be diagnosed easily overlapping of symptoms with other diseases. Apart from making life healthy and diet control, diagnosing at early stage which ultimately saves the lives. People are also unaware of the complications and symptoms associated with chronic illness, despite advances in health departments. This paper analyzes the performance of the classification algorithms such as KNN, SVM, and RF classifier, and MLP for heart failure prediction.
O. S. Priya (B) · K. Srinivas · S. Yeruva Department of CSE, VNR Vignana Jyothi Institute of Engineering and Technology, Bachupally, Hyderabad, India K. Srinivas e-mail: [email protected] S. Yeruva e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_53
545
546
O. S. Priya et al.
2 Background Work In [1], the researchers experimented with the dataset obtained from UCI repository. Researcher compares various decision tree classification algorithms in order to improve better performance in cardiovascular disease diagnosis. By applying data mining techniques, it extracts hidden patterns. Algorithms like J48, logistic model tree, and RFC are used in testing. J48 has the highest accuracy 56.7%, while logistic model tree has the lowest accuracy 55.77%. In [2], provide an overall view of the current research on predicting heart disease. Classification techniques mainly focus on heart disease prediction rather than studying various data cleaning and pruning approach. Using different data mining approaches such as DT, C4.5, K-means, ID3, SVM, NB, artificial neural network (ANN), classification and regression trees (CART) methodology, regression, J48. Selection of combination of data mining techniques and implementation of it on the dataset yields a fast and effective implementation of the system for heart disease management. In [3], for prediction, the authors examined using 15 medical parameters including age, gender, blood pressure, cholesterol, and obesity. MLP with backpropagation is used to build an efficient heart disease prediction system for predicting heart disease risk levels. Results show that there are zero false negative and false positive entries such that system predicts heart disease with 100% accuracy. In [4], the authors have developed and presented a real-time patient monitoring device that uses Arduino which is capable to sense real-time parameters like body temperature, blood pressure, humidity, and heartbeat. It is a cloud-based heart disease prediction device that uses machine learning techniques to identify impending heart disease. Algorithms like ANN, SVM, and RFC have been used, out of which support vector shows the highest accuracy level of 97.53%. In [5], the authors have developed a predictive approach to forecast the chances of heart failure of a patient admitted in the hospital. Different algorithms with their accuracies are decision tree 93.19%, logistic regression 87.36%, random forest 89.14%, Naïve Bayes 87.27%, and support vector machine 92.30%. In [6], the dataset has been collected from ‘Framingham’ with attributes such as gender, age, education, diabetes, BP meds (person on BP medicines), and cigarettes per day. It uses machine learning to predict the risk of coronary heart disease, algorithms like random forest 96.8%, decision tree 92.7%, K-nearest neighbor 92.87%. K-nearest neighbor shows the highest execution time than decision tree and random forest. In [7], consideration of dataset is a retrospective samples of male from a high risk region of the Western Cape of South Africa-KEEL. Different algorithms like SVM, DT, and NB have been used. The accuracy of all the three models tends to show greater than 70%. In [8], to demonstrate prediction applied boosting for each ML Technique. Algorithms like NB, SVM, RFC, Hoeffding Tree, and logistic model tree have been used for effective prediction. Random forest shows the better results compared with all
51 Heart Failure Prediction Using Classification Methods
547
Fig. 1 Proposed framework for heart failure prediction
other techniques. The obtained results were compared with the proposed model of all the techniques such as boosting, bagging and AdaBoost out of which AdaBoost is the best technique with 80.32% of accuracy. In [9], the dataset has been collected from UCI repository which consists of biological parameters which includes blood pressure, sex, age, cholesterol. Algorithms with their obtained accuracies are SVM (83%), DT (79%), linear regression (78%), and K-nearest neighbor (87%). KNN shows the highest rate of accuracy compared with all other algorithms. In [10], the researchers used machine learning algorithms like logistic regression, RFC, DT, and K-nearest neighbor (KNN). KNN is effective in predicting the model with 85.71% accuracy.
3 Proposed System All the sources cited above have named the importance of early detection of HF disease which may help people in living longer lives and improving their lives healthier. The following framework depicted in Fig. 1 is used in this process.
4 Methodology 4.1 Description of the Dataset Input dataset ‘Heart_Failure_Clinical_Records’ was obtained from Kaggle, which consists of 12 attributes with 1 target attribute (death event) and 300 records. There are 7 nominal and 6 numeric attributes. Attributes are age, anemia, creatinine phosphokinase, diabetes, platelets, ejection fraction, high blood pressure, serum creatinine, serum sodium, sex, smoking, time, and death event. ‘Outcome’ of the feature we are going to predict 0 means no heart disease, 1 means heart disease. Table 1 displays the information for each attribute (Fig. 2).
548
O. S. Priya et al.
Table 1 Accuracy obtained using various classification algorithms S. No.
Algorithm
Accuracy (%)
1
K-nearest neighbor
75.0
2
Support vector machine
83.3
3
Random forest
85.0
4
Multilayer perceptron
71.6
Fig. 2 Description of heart failure prediction
4.2 Data Preprocessing Data preprocessing is a process that transforms unstructured data into a format that is more readable and understandable. Its purpose is to clean up the dataset by eliminating duplicates, inconsistencies, missing values, and errors. As a result, data cleaning approach is used for preprocessing the data; we have used mean, median, and mode to fill missing data, which comprises of checking missing values, filling in missing data, and cleaning.
4.3 Implementation The original dataset splits into two sections: training data and test data. We have divided the dataset into two parts as 80% for training data and 20% for testing data. Machine learning techniques such as classification methods are used to test a dataset.
51 Heart Failure Prediction Using Classification Methods
549
Classifiers: A type of supervised learning that enables computers to learn from their experiences. It learns from the input it receives and then applies the learned knowledge to categorize new observations. We apply different algorithms to build model and predict model. In order to test the dataset, we will use the following classifiers: 1. 2. 3. 4.
K-nearest Neighbor Support Vector Machine Random Forest Classifier Multilayer Perceptron.
4.3.1
K-Nearest Neighbor Classifier
It is based on the distance between data points and distinct data that are grouped together. The user determines the number of neighbors for other groups of data, which are referred to as neighbors, which is very important in dataset analysis. KNN is used to perform both regression and classification tasks using numbers (k) of neighbors. It categorizes new data points based on similarity measures. We have considered n neighbors as 4.
4.3.2
Support Vector Machine Classifier
It refers those data points near to the hyperplane whose distance is perpendicular to the hyperplane, if we sum all near points of hyperplane and maximize that distance such set of data points would be called as support vector classifier. It gives the best possible decision boundary, allowing us to categorize data points easily. It chooses extreme points that support the hyperplane imagination, which are referred to as vectors of support; ML algorithms are known as vector support machine. Kernel is a function which visualizes data in different perspective or set of dimensions which easily fits a HYPERPLANE. We have used linear kernal and C refers as penalty with 2 units.
4.3.3
Random Forest Classifier
Random forests or random decision forests uses an ‘ensemble’ learning approach for classification by making multiple decision tree using random samples from training data. A RF is a meta estimator that comprises a large number of DT classifiers on various sub-samples of the dataset and used to improve the predictive accuracy while avoiding over fitting. We have considered n estimators as 200 and criterion used is ‘Gini,’ which is used to analyze the accuracy.
550
4.3.4
O. S. Priya et al.
Multilayer Perceptron
To perform classification tasks, the MLP classifier uses an elemental neural network, which comes under the category of ANN and consists of input layer, two hidden layers, and output layer. It is a method of supervised learning. It is an ANN with large number of perceptron’s. Therefore, ‘tanh’ activation function has been used. The concept of having multiple layers and a nonlinear activation makes MLP different from a normal linear perceptron.
4.3.5
Classification
We are classifying the data as label 0 which indicates that a person with no disease and label 1 indicates that a person with disease. We construct confusion matrix and calculate accuracy, precision, recall, and F1score. Predictions: In this, we will predict survival rate of death. The output labels are: • A person with attack • A person without attack.
5 Results Therefore, we have 12 independent attributes out of which ‘age,’ ‘ejection_fraction,’ ‘serum_creatinine,’ ‘serum_sodium,’ ‘creatinine_phosphokinase,’ and ‘time’ are the important features from the considered dataset. Data Correlation: The correlation matrix shows the correlation among the features and their correlation with the DEATH_EVENT (Target Attribute). Five features— ’age,’ ‘ejection fraction,’ ‘serum creatinine,’ ‘serum sodium,’ ‘creatinine phosphokinase,’ and ‘time’—seem to be the most correlated to the death event when compared with other features. Figure 3 shows the correlation matrix between the features (Figs. 4 and 5). The algorithms with their accuracy are given in Table 2 which is comparative study of algorithms. Accuracy =
Number of Correctly Predicted values Total Number of Predicted values
(1)
51 Heart Failure Prediction Using Classification Methods
Fig. 3 Feature importance
Fig. 4 Feature correlation matrix
551
552
O. S. Priya et al.
Fig. 5 Distribution of classes
Table 2 Comparison of performance with various metrics Algorithms
KNN
Predicted values
0
1
0
SVM 1
RF 0
1
MLP 0
1
Precision
0.75
0.78
0.88
0.75
0.88
0.79
0.72
0.00
Recall
0.95
0.35
0.88
0.75
0.90
0.75
1.00
0.00
F1-score
0.84
0.48
0.88
0.75
0.89
0.77
0.83
0.00
6 Conclusion Heart failure disease is one of the most popularly effected disease. Silent attacks are commonly seen in women, and they cause a large number of deaths as a result of delayed identification. By identifying the functioning of heart failure in human bodies, it helps the diagnosis center’s to identify the properties with good accuracy levels, so that we can avoid any complications like severe pain, shortness of breath. In this paper, we classified data using a dataset which is obtained from Kaggle. We have experimented with KNN, SVM, RFC, and MLP algorithms which were used to determine the existence of survival rate of death from the available dataset, and the performance of accuracy is also presented. We can conclude and prove that machine learning algorithms such as K-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) can successfully predict data based on experimental findings. The obtained average accuracy for testing data is KNN (75.0%), SVM (83.3%), RFC (85.0%), and MLP classifier (71.6%) for target attribute death event. Random forest classifier achieves the best results in predicting the survival rate of death.
51 Heart Failure Prediction Using Classification Methods
553
References 1. J. Patel, T. Upadhyay, S. Patel, Heart disease prediction using machine learning and data mining technique. Int. J. Comput. Sci. Commun. 7, 129–137 (2016) 2. A. Hazra, S.K. Mandal, A. Gupta, A. Mukherjee, A. Mukherjee, Heart disease diagnosis and prediction using machine learning and data mining techniques: a review. Adv. Comput. Sci. Technol. 10(7), 2137–2159 (2017). ISSN 0973-6107 3. P. Singh, S. Singh, G.S. Pandi-Jain, Effective heart disease prediction system using data mining techniques. Int. J. Nanomed. 13, 121–124 (2018) 4. S. Nashif, M.R. Raihan, M.R. Islam, M.H. Imam, Heart disease detection using machine learning algorithms and a real-time cardiovascular health monitoring system. World J. Eng. Technol., 854–873 (2018) 5. F.S. Alotaibi: Implementation of machine learning model to predict heart failure disease. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 10(6) (2019) 6. D. Krishnani, A. Kumari, A. Dewangan, A. Singh, N.S. Naik, Prediction of coronary heart disease using supervised machine learning algorithms, in IEEE Region 10 Conference (TENCON) (2019), pp. 367–372 7. A.H. Gonsalves, F. Thabtah, R.M.A. Mohammad, G. Singh, Prediction of coronary heart disease using machine learning: an experimental analysis, in Proceedings of the 2019 3rd International Conference on Deep Learning Technologies (2019) 8. P. Motarwar, A. Duraphe, G. Suganya, M. Premalatha, Cognitive approach for heart disease prediction using machine learning, in International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE) (2020), pp. 1–5 9. A. Singh, R. Kumar, Heart disease prediction using machine learning algorithms, in International Conference on Electrical and Electronics Engineering (ICE3) (Gorakhpur, India, 2020), pp. 452–457 10. A. Singh, Prediction of heart disease using machine learning. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. (2020). ISSN: 2456-3307
Chapter 52
Detection and Classification of Cerebral Hemorrhage Using Neural Networks P. Bharath Kumar Chowdary, Pathuri Jahnavi, Sudagani Sandhya Rani, Tumati Jahnavi Chowdary, and Kakollu Srija
1 Introduction One kind of stroke which causes draining around the tissues by an artery in the cerebrum is referred to as brain hemorrhage. Draining can happen in between the cerebrum and the layers that cover it. The irritation that is caused by the blood from trauma results in the increase of pressure on brain tissues. This leads to reduced percentage of oxygen from reaching the brain cells. This brain hemorrhage is recognized as a medical complication which requires speedy treatment. Early detection has the ability to rescue lives. Early diagnosis will increase the time span of a person’s life. As a result, the results of this work are noteworthy from both a data science and a medical perspective. A contrast of two neural network approaches is presented in this research. • The first approach is using ResNet. And this network uses a thirty four layer plain specification which is inspired by Visual Geometric Group-19 whereupon the shortcut link is attached to the prevailing network. The architecture is then converted into a residual network by these shortcut connections. • The second approach is to use the DENSENET, which involves training the model to react to patient features such as their CT scan images for the detection and classification of the disease. DenseNet and ResNet are rather similar. But, there are a few dissimilarities which differ them from each other. DenseNet uses concatenation operation to merge the previous layer with the next layer, whereas ResNet utilizes an additive approach to combine the previous layer with the next layer. Each of these approaches helps in identifying whether a patient has brain hemorrhage or not and classify the type of hemorrhage. In the biomedical field, machine learning has had a significant impact on the prediction and detection of cerebral P. Bharath Kumar Chowdary (B) · P. Jahnavi · S. S. Rani · T. J. Chowdary · K. Srija CSE, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_54
555
556
P. Bharath Kumar Chowdary et al.
Fig. 1 Brain hemorrhage
hemorrhage. Machine learning can help facilitate the identification and prediction of diseases of concern in the medical industry, and perhaps even the fairness of decisionmaking. Machine learning algorithms can be used to quickly answer medical problems, minimizing diagnostic costs. The key purpose of this study is to more accurately forecast the outcomes and reduce the cost of diagnosis in the medical field. As a result, we employed two classification algorithms to determine whether patients had cerebral hemorrhage or not and the performance of these techniques was calculated from different perspectives such as accuracy, precision, sensitivity, and specificity (Fig. 1).
2 Literature Review Cerebral hemorrhage shows some kind of symptoms and signs. Those signs and symptoms of cerebral hemorrhage may include sudden, serious migraine, vision problems, loss of coordination with the body, confusion or trouble in understanding, difficulty in talking or stammering discourse, difficulty in gulping, etc. Cerebral hemorrhage causes head injury, liver disease, bleeding disorders, and extreme high blood pressure. In a brain tumor detection and classification survey, different techniques are compared such as the artificial neural network which uses a nonparametric analysis approach. The neural network architecture and amount of inputs have an impact on the results and precision. Fuzzy measure uses the stochastic approach in which the threshold selection affects the results and precision. Decision tree also used a nonparametric analysis approach, support vector machine which makes use of nonparametric analysis with binary classifier approach, and it could control more input data effectively. The hyperplane selection along with kernel parameters has an impact on the outcome of this method [1]. Segmentation approaches, according to H. S. Bhadauria and M. L. Dewal, can be utilized to increase accuracy. In the abovementioned paper, segmentation is done using fuzzy C-means clustering and a region-based active contour approach. In the
52 Detection and Classification of Cerebral Hemorrhage Using …
557
recommended methodology, the active contour is obtained using the fuzzy membership degree obtained from fuzzy C-means clustering. The region-based contour propagation controlling parameters are then inferred using fuzzy clustering. The level set function increases the contour propagation’s convergence speed. For hemorrhage preparation and monitoring, 20 CT scan images are obtained. A predictive performance metric is used to assess the findings. These findings suggest that the offered method excels standard region growth approaches and FCM for identifying cerebral bleeding from brain input images in terms of effectiveness [2]. Pre-processing of a medical image during tumor diagnosis, according to Sonali Patil and DR. V. R. Udupi, is critical for feature extraction and segmentation algorithms to perform appropriately. Eliminating the film artifacts and removing the skull portions from the brain image and ribcage portions from the abdomen and thorax picture using morphological erosion are the two ways presented. A morphological filter has the advantage of preserving the shape of large enough objects, as opposed to a Gaussian filter, which blurs the image. Morphological techniques are therefore suitable for image preprocessing. Significantly reduced the photos preprocessed in this manner remove the undesired areas of the photos, considerably minimizing the risks of over-segmentation during the segmentation process of these photos [3]. In a brain CT scan, Myat Mon Kyaw suggested an automated method for detecting and classifying a stroke anomaly (hemorrhage). Pre-segmentation steps for detecting aberrant regions in brain images are proposed in this research. Thresholding, region growth approaches, and certain supervised and unsupervised approaches were employed as segmentation approaches. In CT, the tracking algorithm is utilized to remove skull sections. The image is divided into four sections. The unneeded regions do not need to be searched and clustered because of pre-segmentation. This presegmentation process speeds up the segmentation process compared to the standard method. Other brain abnormalities, such as tumors, abscesses, and lesions, can be detected with the method [4]. R. J. Ramteke, Khachane Monali Y explained that their aim was to present a method to automatically classify CT scan images into two classes based on the extracted features from images and abnormality detection automatically. Their training data contains 51 image features. Their statistical texture feature set is made up of images that are both normal and abnormal. For image classification, they utilized the KNN classifier. The output image of the normal class has had no extra processing performed to it. However, abnormal class photos were further analyzed for ROI detection. MATLAB 2011 is used to implement the approach. But, the drawback with this approach is that it categorizes an image into only two classes (abnormal and normal) and gives generic results without any specific label [5]. S. B. Kulkarni and Nita Kakhandaki claimed that their research focuses on detecting the right location and kind of hemorrhage in MR brain images. As the input data, echo MR images which are gradient recalled are utilized. The hemorrhagic region is then segmented using a multilevel set evolution technique that is particular to the region and structure. Sharpened tetra features are extracted using a feature extraction technique which is based on local tetra pattern, and the features are then optimized using an enhanced gray wolf optimization technique. Finally, to
558
P. Bharath Kumar Chowdary et al.
classify the types of hemorrhages, algorithm which is based on a relevance vector machine is used. On the scales of accuracy, sensitivity, specificity, precision, the proposed framework is compared with existing methodologies. However, the gray wolf optimization technique has a low convergence rate and a poor local searching problem, which is a disadvantage of this strategy [6]. 3D CNN with rank-based average pooling has been proposed by Yongyan Jiang, Shuihua Wang, Hong Cheng, Xiaoxia Hou, and Sidan Du. Researchers compared CNN with many layers and CNN with various pooling strategies in order to demonstrate the efficiency of the suggested CNN structure. Researchers established a fivelayer CNN for CMB detection and discovered that the number of layers had a substantial relationship with performance. As a result, they began examining five-layer CNN to three-layer CNN and nine-layer CNN. Upon reviewing the outcomes, they noticed that the five-layer CNN had the best performance, with higher sensitivity and accuracy. After that, they developed additional layers of the CNN and determined that the CNN outperformed the previously used approach. The five-layer CNN offers the best performance when it comes to numerous layers. Overall, they determined that of all the approaches analyzed, the CNN with five-layer structure and rank-based pooling algorithm produced the best results [7]. Pratik Mukherjeeb, Weicheng Kuoa, Jitendra Malika, Christian Hanea, ¨ and Esther L. Yuhb have proposed a model using fully convolutional network and convolutional neural network for the assessment of acute cerebral hemorrhage on head CT scan. And they have proposed this model by comparing radiologists model accuracy. To measure model performance, they ran the deep learning algorithm just once on the 200 CT examinations test set. They were able to solve the problem of overfitting to the testing data as a result of this. The method provides the odds of both pixel-level and examination-level cerebral hemorrhage in the presence of intracranial bleeding. For each scan in the testing data, which consisted of 200 CT exams, these probabilities are continuous from 0 to 1. Despite the fact that some of the patients had two or more head computed tomography tests. Each patient was only seen once in the training or testing data, but not both. To determine the existence of acute cerebral bleeding on each computed tomography examination, they estimated the receiver operating characteristic (ROC) [8].
3 Dataset The training data consists of a set of identifiers and numerous labels, one label for each of the five hemorrhage subtypes, as well as an extra label for “any,” whose value will be 1 when value of at least one of the subtype labels is 1 [9]. Label is one of the columns that represents if a particular hemorrhage is present in the corresponding image or not. For each image ID, there will be six rows, as seen below. The label of each row will be [Label] [img_Id] [subtype], as shown below (Fig. 2):
52 Detection and Classification of Cerebral Hemorrhage Using …
559
Fig. 2 Described data
The DICOM format is used for all of the images provided. DICOM images include metadata attached to them. Patient ID, StudyInstanceUID, SeriesInstanceUID, and more features will be included. Id—A unique identifier for an image. Each Id is associated with a single image which helps us to uniquely identify it. Label—It is the likelihood that the indicated image is diagnosed with that subtype of brain hemorrhage. The below graphs show the target distribution of each subtype of cerebral hemorrhage and the imbalance in target distribution (Figs. 3 and 4).
Fig. 3 Graph depicting target distribution
560
P. Bharath Kumar Chowdary et al.
Fig. 4 Graph showing imbalance in target distribution
4 Proposed Methodology 4.1 Implementation Data collection is a systematic process which involves accumulating and storing information which helps us to evaluate outcomes. Data gathering is the first and important step that can influence our model. After data collection, we should pre-process our CT scan images. Image preprocessing includes importing the image, then analyzing and manipulating the image in order to get an altered image as output. Segmentation of images is done after pre-processing. In each image, we need to identify and separate the hemorrhage part from the remaining part of the skull. Since we cannot manually do that, we need an automatic and data-driven method to distinguish these two relatively homogeneous things. So, we use image segmentation for that. It is used to partition data into different segments based on intensity of pixels. For that, we are using Otsu’s method. Otsu’s method is used for the purpose of segmentation. Generally, it helps us to segment an image into various regions based on intensity of pixels [10]. In this method, we process the input image and find the distribution of pixels. Then, it will automatically find a threshold value. Then, noise and unwanted pixels from the image are removed such as the skull part of the image which is of high intensity compared to that threshold value. The objective of this segmentation is to remove the complexities from an image so that it is more meaningful and easier for analysis. The next step involves searching for a model which will best suit our problem. Now, the pre-processed data should be divided into training data and testing data sets. Majority of data should be used for training and is generally called training data and the remaining data is used for testing and is called testing data. After the model is selected, we feed the pre-processed images to that model to train that model. Once
52 Detection and Classification of Cerebral Hemorrhage Using …
561
Fig. 5 Residual block
the training is completed, we will test our model and evaluate the performance of the model based on various performance evaluation metrics. 38 layers are used in both ResNet and DenseNet. The activation function used is ReLu and the weights are updated using the Adam Optimizer. Adaptive Moment (Adam) Estimation is an algorithm for optimization technique for gradient descent. The method is really efficient when working with large problems involving a lot of data or parameters. It requires less memory and is efficient. ResNet The vanishing/exploding gradient is the most common problem in deep learning. To tackle such issues, residual neural networks are used. Identity mapping is used in residual neural networks (ResNet). ResNet employ a method known as “skip connections.” The process of immediately adding the output of one layer to the input of another layer is known as skip connection. Instead of layers, this network will learn the underlying mapping and allow the structure to fit the leftover mapping. If the layer’s input(X) and output(F(x)) are the same, the residual block’s output Y = F(X) + X. There are various cases in which the identity input and the layer output have different dimensions. As a result, the definition can now be written as Y = F(X, Wi) + Ws * X. The CNN layer is given the name Wi (parameters), and the phrase Ws can be accomplished with a convolution configuration that equalizes the input and output [11]. The main dominance behind the skip connections is, it promotes the ability of training much deeper networks which was not possible previously (Fig. 5). DenseNet DenseNet is the most widely used technique in medical research. It is very similar to the ResNet technique with some modifications. It is a densely connected convolutional network, i.e., all the layers in the DenseNet are directly connected. A typical network with G layers will have G connections, whereas in DenseNet, there are G(G + 1)/2 direct connections. In DenseNet, activation maps from all previous levels are utilized as input for each layer, and its own activation-maps are utilized as input for all subsequent levels. DenseNets have numerous benefits such as: DenseNet improves gradient propagation by connecting all the layers directly with each other [12]. It reduces the problem of overfitting on the task which has smaller training set. It encourages feature reuse, makes the network easier to train, and also enhances feature propagation (Fig. 6).
562
P. Bharath Kumar Chowdary et al.
Fig. 6 Dense blocks
Table 1 Performance of ResNet and DenseNet Model
Dataset
Accuracy
Sensitivity
F1-score
ResNet
RCNA intracranial
89.24
93.18
91.05
DenseNet
RCNA intracranial
83
89.23
88.45
4.2 Comparative Analysis This section presents a comparative analysis between ResNet and DenseNet. Each model is trained and tested on RSNA intracranial dataset. ResNet has achieved higher accuracy, i.e., 89.24%. Based on the confusion matrices, ResNet has a high true positivity rate compared to that of DenseNet. F1-score being a descriptictor for both precision and recall tells us that the ResNet model is substantially better than DenseNet (Table 1).
4.3 Validation We used a stratified K-fold cross validation technique in order to split the dataset into training and testing sets in the ratio of 80% and 20%. Then we applied the classification models such as ResNet and DenseNet to compute the result.
4.4 Results All of the qualities in our data gathering indicate the most advantageous indicators of patients with cerebral hemorrhage. Since we acquired label data, the nominal class label in our given data is also present in the test dataset. We used the label encoder to transform nominal values in the subtype column to 0 or, 1 i.e., if the column value is 0, then it means that the particular subtype exists in the corresponding image. The proposed system is efficient when compared to existing systems which have low convergence rates and bad local searching problems. One of the existing systems categorizes the given data into either of the two classes, i.e., abnormal or normal, whereas the proposed system provides the result with a specific label (Fig. 7).
52 Detection and Classification of Cerebral Hemorrhage Using …
563
Fig. 7 ROC curves
5 Conclusion Automated frameworks for grouping medical images have increased a phenomenal degree of consideration recently. It has considerable impact in recognizing the existence of cerebral hemorrhage (the paired characterization issue) and in the event that classification of hemorrhage is done and the problem of multiclass is also resolved. Although there are some algorithms which detect the hemorrhage efficiently and produce reasonable results, they are having few limitations like few algorithms find it difficult in recognizing subdural and epidural hematomas, few of them cannot detect subarachnoid hemorrhage, few consume more power, few of them detect only microbleed, and some are not suitable for large datasets, having longer computation time. Some are not accurate and efficient (involves detection faults). Main cause for these limitations is the existence of standardized procedures in less numbers. Since the diagnosis of brain hemorrhage is a very complicated and sensitive task, accuracy and reliability are given much priority. The investigations introduced in this work indicated that after pre-handling CT Scans, the double arrangement issue was unraveled with exactness. Additionally, the actualized framework accomplished over good results for the order issue of deciding the discharge type utilizing convolutional neural systems as a classifier. The outcomes are truly promising and more elevated levels of exactness for the characterization issue will be accomplished by getting a vastly improved dataset with high goals pictures taken legitimately from the CT scanner. Unique element extraction and highlight selection algorithms could also be used to improve the framework’s appearance.
564
P. Bharath Kumar Chowdary et al.
References 1. G. Krishnan, K. Sivan Arul Selvan, P. Betty, Survey on brain tumour detection and classification using image processing. ELK Asia Pac. J. Comput. Sci. Inf. Syst. (2016). http://doi.org/10. 16962/eapjcsis/issn.2394-0441/20160930.v2i1.02 2. H.S. Bhadauria, M.L. Dewal, Intracranial hemorrhage detection using spatial fuzzy c-mean and region-based active contour on brain CT imaging. SIViP 8, 357–364 (2014). https://doi. org/10.1007/s11760-012-0298-0 3. S. Patil, V.R. Udupi, Preprocessing to be considered for MR and CT images containing tumors. IOSR J. Electr. Electron. Eng. 1(4), 54–57 (2012). https://doi.org/10.9790/1676-0145457 4. M.M. Kyaw, Pre-segmentation for the computer aided diagnosis system. Int. J. Comput. Sci. Inf. Technol. 5(1), 79–85 (2013). https://doi.org/10.5121/ijcsit.2013.5106 5. R.J. Ramteke, Y. Khachane Monali, Automatic medical image classification and abnormality detection using K-nearest neighbour. J. Adv. Comput. Res. 2(4) (2012) 6. N. Kakhandaki, S.B. Kulkarni, A novel framework for detection and classification of brain hemorrhage. Int. J. Recent Technol. Eng. (IJRTE) 7(4) (2018). ISSN: 2277-3878 7. S. Wang, Y. Jiang, X. Hou, H. Cheng, S. Du, Cerebral micro-bleed detection based on the convolution neural network with rank based average pooling. IEEE Access 5, 16576–16583 (2017). https://doi.org/10.1109/access.2017.2736558 8. W. Kuo, C. Hane, ¨ P. Mukherjee, J. Malik, E.L. Yuh, Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc. Natl. Acad. Sci. 116(45), 22737–22745 (2019). https://doi.org/10.1073/pnas.1908021116 9. https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data 10. D. Liu, J. Yu, Otsu method and K-means, in 2009 Ninth International Conference on Hybrid Intelligent Systems (2009). http://doi.org/10.1109/his.2009.74 11. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016). http://doi.org/10. 1109/cvpr.2016.90 12. Z. Zhong, M. Zheng, H. Mai, J. Zhao, X. Liu, Cancer image classification based on the DenseNet model. J. Phys. Conf. Ser. 1651, 012143 (2020). https://doi.org/10.1088/1742-6596/1651/1/ 012143
Chapter 53
A Novel CNN-Based Classification and Prediction of COVID-19 Disease Using Deep Learning Talluri Sunil Kumar, Sarangam Kodati, Sagar Yeruva, and Talluri Susan
1 Introduction Coronavirus disease-19 (COVID-19) is an epidemic by which more than 800,000 are died and more than 20 million people are infected worldwide since December 2019 [1]. On the date January 30, 2020, as a global health emergency, COVID-19 is declared by the World Health Organization (WHO). COVID-19 clinical diagnosis can be assisted by using the chest computed tomography (CT). But, radiologists and clinicians shortage so results that attacked coronavirus patients are increased rapidly. Chest CT images with classification of computer-aided COVID-19 are required automatic methods development [2]. COVID-19 classification with chest classification is takes the advantages of some machine learning methods [3]. In some cases, precision medicine strategies are given by an effective risk prediction model for tailoring clinical management to individual patient’s requirement so total recovery probability time is also increased. Patient flow can be optimized by allowing the emergency department; therefore, waiting time is also reduced [4]. Prediction of patient outcome goal is including with conducting the substantial amount of research by different data type discussion as it includes radiological, T. Sunil Kumar (B) · S. Yeruva Department of CSE, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, Telangana 500090, India e-mail: [email protected] S. Yeruva e-mail: [email protected] S. Kodati Department of CSE, Teegala Krishna Reddy Engineering College, Hyderabad, Telangana 500097, India T. Susan Arizona State University, P.O. Box 872812, Tempe, AZ, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_55
565
566
T. Sunil Kumar et al.
laboratory, and clinical features. Even though several authors are reporting some promising results, most of them are biased with one of two reasons, and it is stated by the COVID-19 prognosis/risk prediction methods recent survey [5]. In that first one, lack of clinical follow-up data for many published studies regarding to the inaccurate categories used by the machine learning, due to severe complications are frequently developed in the patient and initial clinical encounter used for ML [6]. Second one is, many studies cannot use the present clinical data in their research rather they can use the measurements of available last predictor from the electronic health records. Many complex problems are solved by the different subareas of artificial intelligence (AI) [7]. Thinking, learning, preparation, information representation, and searching are involving in the AI subareas. Intelligent models are provided by the deep learning (DL) and machine learning (ML) algorithms which are the subsets of AI for specified task identification. ML requires a small amount of knowledge for solving problems because of its statistical models and new algorithmic modeling culture so it is the main subset of AI. After this ML, next important model is DL, subset of ML. Many people use the DL algorithms in past years. But it requires large amount of data in its solving. Approvingly, these ML and DL methods are introduced in many research works as in medical, military sectors, and technological [8]. Along these fields in COVID-19 war also introduced the ML and DL methods based on advanced AI within a short period after it crashes the COVID-19. Therefore, a computational system based on the machine learning is designed in this paper so it can be easily organized and used by the emergency departments for fast and early CT image classification and used to calculate the COVID-19 patients risk levels.
2 Machine Learning Techniques for COVID-19 Since past decade, incredible progress is achieved by the machine learning field. Computational capacity is increased enormously when the algorithms of deep learning are coupled with ML [9]. So, different applications are adopting this machine learning field. Possibly, machine learning fights against the COVID-19 pandemic, and it is main application of machine learning. Different paths are pursued by the many researchers to battle COVID-19 with the help of machine learning [10]. Different issues are tackled regarding to the virus by the several applications range of machine learning. For instance, COVID-19 diagnosis is using the ML and DL in analyzing the medical images as well provides security to medical staff members from affected patient and gives the patient’s severity score for further treatment. Disease transmission models are constructed for the accurate prediction outbreaks, and transmission list and transmission path effects are learned by the ML- and DL-based intelligent models which include social media data and COVID-19 case data [11]. In public monitoring and epidemic protection uses the ML and DL very much as in airport security checkups through the patients are identified and epidemic detection [12]. Four types of machine learning algorithms are used in the COVID-19 and these types are:
53 A Novel CNN-Based Classification and Prediction …
• • • •
567
Forecasting Medical diagnostics Drug development Contact tracing.
A number of new infections are predicted successfully by the deep learning algorithms. A traditional approach such as ARIMA models gives the slow performance than recurrent neural networks in forecasting of time series [13]. In prediction of number of new infections and infections spread model by the researchers with the help of recurrent neutral network and also used in the changing networks of long short-term memory [14]. Therefore, it is the machine learning model which is most important application. With the help of computer vision, the advancements in the medical diagnosis are possible. Human accuracy levels are nearly matching with the machine learning models in number of image recognition tasks. As a result, without any doubt with the COVID-19 patient chest X-ray images, the signs of virus can be detected by the image recognition software. But many of the countries till now are not having proper medical and testing facilities or too expensive procedure for chest X-ray images, and this is only way to detect or diagnose the virus. Based on the chest X-ray image, COVID-19 can be detected with an accuracy of 99% by using the deep leaning approaches which is stated in the previous studies. Virus-infected people are traced by the authorities with the help of artificial intelligence smart contact tracing. Virus spread tracing by the studies with different artificial intelligence-based software solutions. Researchers have been guided by the machine learning algorithms for new discoveries in many fields [15]. Chemical compositions are analyzed with the help of variation auto-encoders which gives the results in producing new medicines. Applying the existing flu vaccines to the auto-encoders may result in COVID-19 vaccine creation.
3 CNN-Based Prediction of COVID-19 Using Deep Learning X-ray images are utilized in the proposed CNN, which is extracted from the COVID lung dataset. Figure 1 represents the framework of CNN-based classification and prediction of COVID-19 disease approach. Two modules such as CNN and preprocessing are existed in the proposed framework.
3.1 COVID Lung Dataset Two datasets are used in this process for extracting the X-ray datasets. In dataset 1, both the virus and bacterial pneumonia, individually contain the 79 images at
568
T. Sunil Kumar et al.
Fig. 1 Framework of CNN-based classification and prediction of COVID-19
COVID Lung Dataset Pre-processing BootStraping Analyze the Dataset Train Data
Test Data
Pattern Matching CNN Prediction Result
Dataset (2020). Dataset 2 is having the COVID-19 patients 78 X-ray images and normal people 28 images at Kaggle (2020). Deep learning model performance is influenced by the training dataset size. Deep learning model uses the large dataset, but the dataset of COVID-19 is very small; so deep learning-based models’ generality and toughness calculation is too difficult. This problem is overcome by mixing of CNN network with large X-ray images by using the Keras Image Data Generator class. Image augmentation configuration is defined by the image generator class which is provided from the Keras. Shift, random rotation, dimension reordering, whitening, and fips are the capabilities of Keras.
3.2 Data Preprocessing Large amount of noisy data and missing data are present in the real-life information. So, there is a need to eliminate these noisy data through the preprocessed step for making strong predictions. The framework of proposed model is represented in Fig. 1. Missing’s and noise data is also present in the collected data. Effective and accurate results are obtained if filling the missing values and cleaning the noise. Aggregation, normalization, and smoothing are the tasks which involved in the transformation and one form of data is changed to another form of data by it. From different sources, the data to be integrated and it must be done before the preprocessing the data. The
53 A Novel CNN-Based Classification and Prediction …
569
effective data is achieved if formatted gained data even it is complex. Then the data can be divided into two datasets as training and testing datasets and go through the different algorithms for best prediction of disease.
3.3 Bootstrapping The configurable parameters are exposed in the block, and these are adjusted in between the 0.1–1.0.
3.4 Analyze the Dataset Confirmed cases are to be analyzed daily by sliding window for the peak of the 1st and 2nd and starting point detection. Probability distributions parameters conclusion uses the subsequent information which is work at the simulator core. Intervention effectiveness score is estimated through the supervised learning, and it is also used in calculation of research data population. The parameters are varying iteratively through the optimization of model and then strongly reduces the errors in prediction data according to use of historical data.
3.5 Pattern Matching Validation, testing, and training are the three portions which are divided from the images, and this can be done in the pattern matching block. This pattern matching can determine the patterns of image data by performing pattern generation on the trained data and then comparing the patterns of test data with the trained historical data. The trained model efficiency is tested by the validation set which is preserved as one small portion and fivefold are formed by dividing the remaining portion. In these fivefold, pick up one as test data and fourfold as training data in each time, whereas dividing the images there is need of no patient overlap into validation, testing, and training sets. That means same patient different images are not to be existed in the multiple sets. Then the prediction of disease can be done using the patterns generated from the trained and tested pattern matching outcome using a machine learning technique called CNN.
570
T. Sunil Kumar et al.
3.6 Convolutional Neural Network (CNN) 6 dropout layers, 6 max pooling layers, 6 convolutional (Conv2D), 8 batch normalization layers, 8 activation function layers, 3 fully connected layers, and 1 flatten layer in total 38 layers existed in the proposed CNN model. Image shape is (150, 150, 3) for the input of CNN model that means, RGB image of 150-by-150 is used. After increasing every two Con2D layers, 3 × 3 size kernel is used in all Con2D layers. 64 filters are used for the Con2D 1st and 2nd layers for learn about input, 128 filters are used for the Con2D 3rd and 4th layers, and similarly, 256 filters are used for the 5th and 6th layers. 2 × 2 pooling size with the max pooling layer is used, the axis = − 1 argument with the batch normalization layer is used, the ReLU function with the activation layer is used and 20% dropout rate with the dropout layer is used after every Con2D layer. Max pooling, batch normalization, activation, and dropout layer are following the final Con2D layer output neurons 256 output. Output as three-dimensional matrix is given by the convolutional layer and final pooling layer for flattening the matrix and a vector can be converted by using the flattening layer and further it can be used as three dense layers input. Binary classification of CNN is representation as follows: loss function of binary cross-entropy (BCE) is used. The classification of data is required only one output because it is a binary classification from given two classes. Give the output value to the sigmoid activation function in case of BCE loss function. The sigmoid activation function gives the output which is in between 0 and 1. The actual class and predicted class errors are detected by the sigmoid activation function. The change in learning rate and attribute weight uses the “Adam” optimizer in order to reduce learning model losses. The initial experiments give the model parameters. Number of convolution layers usage can differentiate the usage of different configurations with CNN in the model. The number of used convolution layers decision is taken by using incremental approach in the model. First, results are to be analyzed after the testing of CNN under one convolutional layer then the test goes to second layer and results are analyzed correspondingly and so on. Continue the process until unless getting the effective and accurate results as output. Six convolution layers are present in its results, so the obtained final model is very reasonable. In the Results section, each increment results of the model is represented.
3.7 Predictions The intervention effectiveness scores are used for simulation of proposed approach, in order to find the intervention relative impact in future. This can be done for a specific country or population after learning the simulation parameters. Effectiveness scores are given disease what-if scenario progression pattern over time like the second peak in June.
53 A Novel CNN-Based Classification and Prediction …
571
4 Results Precision, accuracy, F-score, recall, specificity, sensitivity, and AUC are the performances metrics for the evaluation in this study. Four terms such as true negative (TN), true positive (TP), false negative (FN), and false positive (FP) define the above performance metrics. In this positive refers to that patient that has the positive test report and negative denotes that patient has negative test report. Therefore, it is clear that the term FP refers that patient does not have the disease, but test report is positive while diseased patient with positive report is denoted by TP. In the same way, FN refers the patients who do not have any disease with negative report and diseased patient with negative report is denoted by the FP. According to this, analysis sensitivity is defined as: Sensitivity =
TP TP + FN
High sensitivity is required in detecting the serious diseases. The diseased patients are correctly classified if the sensitivity is 100%. Definition of the specificity is as follows: Specificity =
TN TN + FP
Testing process of specificity and sensitivity are not considered the cut-off point. It affects both the false positives and false negatives numbers. If the cut-off values are high which raises the false negatives, then the test is low sensitive but highly specific and the false positives are raised by the low values of cut-off which indicates that test is less specificity but highly sensitive. Most used performance metric for the classifiers is accuracy and its definition as follows: Accuracy =
TP + TN TP + TN + FP + FN
In the classifier performance analysis, recall and precision are also used widely. Only positive cases are calculated by the precision and its formula is: Precission =
TP TP + TF
The classifiers performance is measured with statistics with the help of F-score. It requires the precision and recall values of classifiers and calculates the value in between 0 and 1 classifiers indicative. According to these classifiers are arranged from lowest to highest performance. F1-score can be computed as: F1 Score = 2 ×
Precission × Recall Precission + Recall
572
T. Sunil Kumar et al.
Three scenarios with the inclusion of different classes are used in the testing of proposed model. In scenario 1: two classes are used in the training and testing of proposed model and used two classes are normal and COVID-19. In scenario 2: three classes are used in the training and testing process and these three classes are normal, COVID-19, and virus pneumonia. In scenario 3: total four classes are used in the process of training and testing process of proposed model and these four classes are normal, COVID-19, virus pneumonia, and bacterial pneumonia. Proposed approach of CNN performance analysis in classifying and prediction of COVID-19 diseases is depicted in Fig. 2. It can be shown that more than 90% accuracy is obtained for the three scenarios that means for the 2 classes, 3 classes, and 4 classes also. The better precision and F-score values are also obtained as shown in figure. The comparative results are mentioned in the above Table 1 among the three different models as VGG16, AlexNet, and CNN in three different classes as two, three, and four. From the results, it is clear that proposed model gives the best results than the two remaining classes in accuracy, precision, and F1-score metrics. Fig. 2 Proposed approach of CNN performance analysis
1 0.95 0.9 0.85
2 Classes Accuracy
3 Classes Precision
4 Classes F-Score
Table 1 Comparative analysis of performance metrics for the classifiers Model
Classes
Accuracy
Precision
F-score
Proposed CNN
2
0.9802
0.9851
0.9812
3
0.9123
0.9306
0.9735
4
0.8975
0.9123
0.9328
2
0.9768
0.9806
0.9793
3
0.8895
0.9216
0.9523
4
0.8751
0.8924
0.9131
2
0.6892
0.7126
0.7654
3
0.8726
0.8968
0.9234
4
0.8274
0.8862
0.9156
VGG16
AlexNet
53 A Novel CNN-Based Classification and Prediction …
573
Fig. 3 Training time comparison of different approaches AlexNet VGG16 Proposed CNN
0
1
2
3
4
The comparison of training time in three different models as AlexNet, VGG16, and proposed CNN is represented in Fig. 3. Both the models of AlexNet and VGG16 require more training time than the proposed CNN approach because of these complex construction. Therefore, the proposed approach gives the best performance in predicting various diseases like coronavirus and some bacterial infections.
5 Conclusion By using the chest X-ray images, COVID-19 patients are differentiated from normal people with convolutional neural network-based classification, and deep learning is presented in this paper. Deep learning model uses the large dataset, but the dataset of COVID-19 is very small so deep learning-based models generality and toughness calculation is too difficult. This problem can be overcome by mixing of CNN network with large X-ray images by using the Keras’ Image Data Generator class. Different classes (as two, three, and four) are used for evaluating the CNN performances, and these are normal people, COVID-19-affected people, bacterial pneumonia, and virus pneumonia. Performance evaluation uses the different metrics such as precision, accuracy, and F-score values in different models such as AlexNet, VGG16, and proposed CNN model. From the results, it is clear that CNN model gives the best performance than other models.
References 1. S.-H. Gao, Y.-H. Wu, D.-P. Fan, J. Mei, J. Xu, M.-M. Cheng, R.-G. Zhang, JCS: an explainable COVID-19 diagnosis system by joint classification and segmentation. IEEE Trans. Image Process. 30 (2021) 2. A. Gómez-Ríos, S. Tabik, I. Sevillano-García, J.L. Martín-Rodríguez, D. Charte, E. Guirado, M. Rey-Area, J. Luengo, J.L. Suarez, P. García-Villanova, M.A. Valero-González, F. Herrera, E. Olmedo-Sánchez, COVIDGR dataset and COVID-SDNet methodology for predicting COVID19 based on chest X-ray images. IEEE J. Biomed. Health Inform. 24(12) (2020)
574
T. Sunil Kumar et al.
3. Y. Xiao, Y. Liu, Analysis and prediction of COVID-19 in Xinjiang based on machine learning, in 2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT) (2020) 4. M.R.H. Mondal, P. Podder, Machine learning to predict COVID-19 and ICU requirement, in 2020 11th International Conference on Electrical and Computer Engineering (ICECE) (2020) 5. G.S. Choi, A.A. Reshi, F. Rustam, S. Ullah, A. Mehmood, W. Aslam, B.-W. On, COVID-19 future forecasting using supervised ML models. IEEE Access 8 (2020) 6. D. Toshniwal, P. Kumari, Real-time estimation of COVID-19 cases using machine learning and mathematical models—the case of India, in 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS) (2020) 7. R. Shimpi, V. Parashar, M. Mishra, Development and evaluation of an AI system for early detection of Covid-19 pneumonia using X-ray (student consortium), in 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM) (2020) 8. ˙I. Aydin, M. Sevi, COVID-19 detection using deep learning methods, in 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI) (2020) 9. M. Ali Nasseri, A. Eslami, M.H. Sarhan, M. Maier, D. Zapp, N. Navab, C.P. Lohmann, Machine learning techniques for ophthalmic data processing: a review. IEEE J. Biomed. Health Inf. 24(12) (2020) 10. M. Alaei, M. Ghorvei, S.M. Rezaeijo, A machine learning method based on lesion segmentation for quantitative analysis of CT radiomics to detect COVID-19, in 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS) (2020) 11. A. Chinnalagu, D. Ashok Kumar, Sentiment and emotion in social media COVID-19 conversations: SAB-LSTM approach, in 2020 9th International Conference on System Modeling and Advancement in Research Trends (SMART) (2020) 12. C. Delcea, R. John Milne, C. Ioan˘as¸, L.-A. Cotfas, Evaluation of boarding methods adapted for social distancing when using apron buses. IEEE Access 8 (2020) 13. N.Y. Fareed, H.I. Mustafa, COVID-19 cases in Iraq; forecasting incidents using box—Jenkins ARIMA model, in 2020 2nd Al-Noor International Conference for Science and Technology (NICST) (2020) 14. M.B. Alsabek, I. Shahin, A. Hassan, COVID-19 detection system using recurrent neural networks, in 2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI) (2020) 15. M. Usman, S. Latif, W. Iqbal, S. Manzoor, G. Tyson, J. Crowcroft, J. Qadir, A. Razir, I. Castro, M.N. Kamel Boulos, A. Weller, M.N. Kamel Boulos, Leveraging data science to combat COVID-19: a comprehensive review. IEEE Trans. Artif. Intell. 1(1) (2020)
Chapter 54
Automated Defect Detection in Consumer-Grade Knives Using Active Planning Keshav Kumar
1 Introduction Quality of products is of superior importance in the field of manufacturing. With the developments in the Fourth Industrial Revolution, artificial intelligence machine vision has been instrumental in collecting information such as the number of products, defect detection, and types without human intervention [1]. Many manufacturing processes have become fully automated resulting in high production volumes [2]. This enhances operational efficiency to optimum levels. The ability to quickly and accurately identify defects as soon as possible within the manufacturing process is important to ensure smooth operation, resource efficiency, and quality products. Despite the advanced automated techniques used to manufacture complex products, many products still undergo a manual visual inspection. Yet, the inspection of products is a necessary component of quality control and assurance; using skilled workers for the inspection process, however, drains valuable workers away from other important tasks. With this in mind, we examine automating the defect detection process in the consumer-grade knife domain. The contributions of this paper are twofold: 1. 2.
We analyze the effectiveness of using computer vision techniques with adaptive planning for active classification. We successfully apply and test our approach to active classification in the consumer-grade knife domain with real-world data.
K. Kumar (B) Savitribai Phule Pune University, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_56
575
576
K. Kumar
2 Related Works Existing Methods: Thadeus Brito et al.’s work describes an innovative approach that uses a collaborative robot to support the smart inspection and corrective actions for quality control systems in the manufacturing process, complemented by an intelligent system that learns and adapts its behavior according to the inspected parts [3]. Our approach: We have decided to focus on defects created during knife manufacturing. One example of this process involves quality control workers visually inspecting batches of knives for a wide variety of defects. Characterization of a defect varies between knife type, defect type, and defect severity. On average, a worker may spend 4 s (2 per side) inspecting each knife and typically achieves an accuracy upper-bounded by 80%. Our setup is built around this example. We show that active classification for defects in knives is possible using a modern vision system united with an adaptive planning algorithm.
2.1 Computer Vision Computer vision, the automated extraction of meaningful information from images, relies on two key components for success: feature extraction and image classification. Feature extraction is necessary to deal with the extremely large data sets that comprise a single image; it allows one to cut down the number of features examined from all possible features to a smaller, transformed subset of features that yields the most pertinent information. While there are general methods currently in use across machine learning to deal with feature extraction (e.g., principal component analysis) and selection, there are specific methods used in computer vision that take advantage of the image data structure. Image classification is done using common machine learning techniques for classification. Machine learning comes in two forms: supervised and unsupervised. Unsupervised image classification is done by feeding an algorithm a large data set of images or videos and then allowing the machine to pick out the important features and classify these images. Deep learning with neural networks has been effectively applied while it is still in its development. Unsupervised learning with image classification requires an expert to label a set of images beforehand. The labeled images are then used to instruct or train the classifier. While more tedious, given a large enough training set, these algorithms can be very efficient and effective. Existing methods: For example, a real-time vision system for surface defect detection in steel manufacturing was successfully implemented by Jia et al. using standard supervised learning algorithms and a clever feature extraction scheme [4]. The
54 Automated Defect Detection in Consumer-Grade Knives …
577
authors used a rough filter to detect possible edges in the steel, and then, features are extracted from that according to their pixel length, the grayscale contract of a seam to an adjacent area, the intensity differences between the two sides of the seams, and the mean and variance of the seam regions. Using these features, the images were then classified using two standard classification algorithms: K-nearest neighbors (KNNs) and a support vector machine (SVM). The authors were able to classify the images with greater than 90% accuracy at a rate of 172 images per second, fast enough to detect defects in real time for a steel rolling machine that could reach speeds of 225 MPH. Our approach: A wavelet filter, Gabor filter, and Gabor wavelet filter were successfully used to classify corrugation defects on rail tracks [5]. In particular, the authors found the Gabor filter to be most successful. The image was first convolved with the Gabor filter at four different orientations. Each of these images was then evaluated with an energy distribution function yielding the mean and variance. The mean and variance from each filter orientation were assembled to create a feature vector of size 8 for the original image. This is the filter we will be adapting for our approach.
2.2 Adaptive View Planning Informative path planning (IPP) is defined as autonomously deciding what path to take while collecting measurements, based on a probabilistic model of the quantity being studied [6]. IPP algorithms, when used in domains that exhibit the property of sub-modularity (e.g., sensor placement), yield simple greedy solutions that carry the important theoretical guarantee of a close approximation to the optimal solution (within 63% of optimal). Additionally, it can be seen that in many cases, greedy solutions are able to perform within 1% of optimal. Adaptive view planning (AVP) is a specialized case of IPP. By having an autonomous agent decide which view angles provide the most pertinent information necessary to correctly classify a specific object in the shortest amount of time or with the lowest path cost, AVP aims to optimise the classification success rate in the shortest amount of time or with the lowest path cost. The planning is adaptive in the sense that what is seen from one view angle impacts which view angle the agent decides to move to and look at next [6]. Existing methods: AVP has been examined in the domain of examining ship hulls for mines [6]. The authors reduced the number of needed views by 80% compared to non-adaptive methods. Our approach: The detection of defects in knives maps well to the AVP method. A worker might catch a glimmer of light reflecting off the surface of a knife that alerts them to a suspicious area. The worker will then turn the knives in such a way as to clarify whether or not the glimmer was indeed caused by a defect.
578
K. Kumar
Due to this close mapping and the success of similar methods in other domains, we will be using an adaptive view planning algorithm that utilizes a greedy horizon approach to actively classify one type of defect in a consumer-grade knife.
3 Methods In practice, solving the problem of knife defect detection requires an end-to-end system comprised of a vision system, a knife manipulation system, and an accurate classification system. We focused on implementing a robust vision system and accurate classification system while simulating the knife manipulation system. For this study, the goal is to minimize entropy, making our belief of defect locations as strong as possible. We begin by discretizing the continuous view space into 9 discrete views along either the x or y rotation axis. Additionally, we discretize the knife into probability regions representing our belief of if a defect is present. Using historical data to predict the locations of defects, a simple 1-step greedy lookahead algorithm predicts which view state will provide the largest gain in entropy versus transition time. That is, it is better to take a quicker, less valuable observation than a trivially more valuable observation that is farther away on the knife. Transition time is modeled as the cost to move to a new state and take an observation. Once a view is selected, the move is performed, and an actual observation is taken. Our belief state is then updated via a Bayesian update. In order to test our algorithm, we compare against random view selection. For both algorithms, we select views until all 9 discrete views have been chosen, without repetition. For random, 1000 statistical runs are performed.
3.1 Data Generation A large, comprehensive data set is needed to train a robust classifier capable of accurately detecting surface defects in knives. To generate our data, the knives were first filmed under a variety of lighting conditions from multiple angles. Still, images were then extracted to create a training set. The images selected represent many different angles and several lighting conditions. The image set contains both positive and negative examples of the defect. Next, a sliding window technique was used to generate additional training data from the image set. This process is shown in Fig. 1. This was accomplished using a custom program that allows a user to specify the location of all defect corners. From this, the sliding window is able to automatically determine if the region contains a defect. Using this information, the program is able to automatically label the training set data for use in supervised learning.
54 Automated Defect Detection in Consumer-Grade Knives …
(a) Selected defect corners
(b) Sliding window
579
(c) Example labeled windows
Fig. 1 Training data generation
Fig. 2 Computer vision algorithm flowchart
3.2 Computer Vision Algorithm The computer vision algorithm employed in this paper was first proposed for use in defect detection in railroad tracks. The algorithm uses Gabor filters and an energy distribution function to extract features from an input image. The classification is performed by an SVM classifier. A flowchart of the algorithm can be seen in Fig. 2. First, the input image is fed through a filter bank consisting of Gabor filters. Four wavelet orientations are used: 0°, 45°, 90°, and 135°. Each orientation is replicated for three different filter sizes. This yields a total of twelve filters. Next, the magnitude operator is applied to the filtered images. An example of the Gabor filter process can be seen in Fig. 3. The next step is feature extraction for input into the SVM. The features used are the mean and variance of each filtered image. This allows for an image of high resolution to be collapsed down into a manageable feature set. The final resulting feature vector consists of all means and variances. For our algorithm, the feature vector is of length 24.
3.3 Adaptive Planning Given the output of the vision system, we actively classify the knife as containing a defect or not by using a greedy horizon planner to reduce the entropy of the belief
580
K. Kumar
(a) Small Gabor filter bank
(b) Large Gabor filter bank
Fig. 3 Gabor filter bank examples
of all the regions of the knife. We simulate the robotic manipulation of the knife by characterizing each view angle as a state. The planner then decides, given a current state, which state to view next based on what has been seen up to the given point. A movement cost is incurred from one view state to another using the angular Manhattan distance as a metric. This assumes that the camera is situated perpendicular to the knife in the x- and y-axes initially. Additionally, simulated camera view time is added to the movement cost. This simulates the time required to take and process a single image. We start with an initial belief that the knife has a defect in each region at 50% and set our confidence in the observed measurement to the experimental accuracy of the vision system. P(Di )θ0 = (0.5)
(1)
P(e|Di ) ∈ {0.2, 0.8}
(2)
To make an accurate classification of the knife, the total entropy for all knife regions is minimized. This, in turn, increases the confidence in our belief for a given knife region. The total entropy of the knife is formulated in terms of discretized knife regions as: H (θ ) =
i ∈ κ H (P(Di )θ)
(3)
After selecting a view (whether randomly or using the greedy horizon planner), a Bayesian update is performed over all the knife regions based on what is predicted at the current view. P(D|e) = P(e|Di )P(Di ), i ∈ K
(4)
54 Automated Defect Detection in Consumer-Grade Knives …
581
The value in selecting the next view is captured by using a Bayesian update. Historical data can be used to predict in which knife regions defects can be reliably seen. Additionally, different view angles better highlight defects in different knife regions. This is modeled as a confidence value representing how likely a specific view will accurately identify a defect in a given knife region. This allows us to perform our Bayesian update along a linear scale, based upon our confidence value, as given below P(Di )θt+1 = i P(Di |e) + (1 − i)P(Di )t
(5)
For this study, we assign confidences where larger confidence values represent a good view for a given region. Given these predictions to how a new view angle will update the knife region probabilities, we assign a value of moving to the next view as MoveValue(θt+1 ) = (H (θt ) − H (θt+1 ))/(MoveCost(θt , θt+1 ))
(6)
This equation balances the predicted information gained from the considered next view with the budget cost of moving to this view. Thus, for the single lookahead greedy search, the next view is chosen by maximizing Eq. (4). θt+1 = arg max MoveValue(θ t )
(7)
I ∈ t
(8)
4 Results 4.1 Computer Vision Algorithm Testing of the proposed vision algorithm shows good results. Results show that for training, validation, and test data sets, the defect detection rate is roughly 80%. It is important to note that while promising, these results were generated from a limited data set in terms of knife and defect examples. Increasing the number of examples, both positive and negative, in the data set will yield a more robust classifier.
4.2 Planning Algorithm Figure 4 illustrates an example (for a single random run) of how the defect probability regions over the knife change as views are taken. Note that the same number of views
582
K. Kumar
(a) Greedy selection after 0 views
(b) Greedy selection after 2 views
(c) Greedy selection after 5 views
(d) Greedy selection after 8 views
Fig. 4 Probability of defect by knife region. Note that the greedy selection achieves
does not necessarily represent the same budget of time. Additionally, note that in this case, for random selection, only 7 views were obtained in the 4-s budget while greedy selection obtained 8 views. Figure 5a shows the entropy reduction over time. In this plot, each data point represents a view. The x-axis error bars represent the fact that different views require different amounts of time to reach. The y-axis error bars represent the variance in the entropy reduction. Figure 5b shows the information gained as a function of view count. This is the inverse of the entropy reduction. Note that as the view count approached full coverage, the variance is reduced, and the random selection average converges to the greedy selection value.
54 Automated Defect Detection in Consumer-Grade Knives …
(a) Total entropy by inspection time
583
(b) Total information by number of views
Fig. 5 Comparison of greedy view selection versus random view selection. Error bars represent 1 standard deviation calculated over 1000 simulations
5 Discussion From Figs. 4 and 5, it can be seen that the greedy selection algorithm outperforms random selection. Note that in Fig. 4, it can be seen that while both algorithms achieve a similar solution after 4 s, the greedy selection performs in a more efficient manner. The greedy selection identifies areas of interest first and then refines its belief as more views are taken. Conversely, the random selection chooses random views and only converges to a final belief once many views have been obtained. Figure 5a shows that for a given budget of 4 s, the greedy selection algorithm is able to achieve more views than the random selection algorithm. It is able to achieve 8 views, only missing the final view by roughly 100 ms. The random selection is able to achieve a range of 6–8 views with 8 views in the time budget, while the random selection only achieves 7 views in the same time frame. Value peaks represent higher probability of the presence of a defect in a given region an average of 7. From both Fig. 5a, b, it can be seen that given enough time, the random algorithm is able to achieve all views. In this case, the final entropy value converges to the same value of the greedy solution. This convergence is evident in the fact that the y-axis error bars approach zero. From our results, we theorize that as the number of possible view states increases, the performance of random selection will continue to degrade. This is due to the fact that while greedy selection will likely still prioritize many views from all angles of the knife, random selection will have a high probability of selecting very similar views that do not provide much additional information. This performance discrepancy will become increasingly evident as the number of views approaches continuous space.
584
K. Kumar
6 Conclusions and Future Work In this paper, we show that current computer vision techniques are successful in classifying a single surface defect on a knife. Additionally, we show that using these vision techniques in conjunction with real-world data allows a simple greedy adaptive planner to outperform random view selection. While the results in this paper are promising, there are several avenues for future work. First, while additional parameter tuning in the vision algorithm may yield a higher detection rate, there are other avenues to explore. These avenues include the use of alternative lighting sources (e.g., near field and dark field). The use of these lighting sources may help to increase the visibility of defects and thus make them easier to classify. This, in turn, will lead to a higher defect detection rate for the classifier. Next, there is room to explore more sophisticated planning methods. These include planning with a longer horizon as well as planning over a larger number of views. Planning with a longer horizon would allow for the algorithm to maximize the longterm information gain. As new information from the vision system is received, the plan can be updated. Additionally, it would be interesting to see the effects of a larger set of possible view angles. Specifically, using views with rotations along multiple axes may provide interesting results. Last, the nature of the task begs the question of if the view planning algorithm can be improved by incorporating inspection data from human operators. The motivation behind this is that through experience, skilled human inspectors build a large knowledge base of how/where defects typically present. This allows them to inspect knives more effectively than a novice. We theorize that in most cases, human inspectors will follow a general view sequence that they have learned works well. When the inspector notices something that is a potential defect, they may adjust their inspection sequence to more closely inspect that area. This data can be incorporated into the adaptive planning algorithm by using it to help predict the information gains of the set of possible next views. That is, based on the results of the vision system and human inspection data, we may be able to boost (or reduce) the predicted information gain of certain views.
References 1. Y. Han, J. Jeong, Real-time inspection of multi-sided surface defects based on PANet model, in Computational Science and Its Applications—ICCSA 2020. Lecture Notes in Computer Science, vol. 12250, ed. by O. Gervasi et al (Springer, Cham, 2020). http://doi.org/10.1007/978-3-03058802-1_45 2. A. Tiwari, K. Vergidis, R. Lloyd, J. Cushen, Automated inspection using database technology within the aerospace industry. Proc. Inst. Mech. Eng. Part B J. Eng. Manuf. 222, 175–183 (2008). https://doi.org/10.1243/09544054JEM938 3. T. Brito, J. Queiroz, L. Piardi, L.A. Fernandes, J. Lima, P. Leitão, A machine learning approach for collaborative robot smart manufacturing inspection for quality control systems. Proc. Manuf. 51, 11–18 (2020). ISSN 2351-9789. http://doi.org/10.1016/j.promfg.2020.10.003
54 Automated Defect Detection in Consumer-Grade Knives …
585
4. H. Jia, Y. Murphey, J. Shi, T.-S. Chang, An intelligent real-time vision system for surface defect detection, in Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 3, pp. 239–242 (2004) 5. C. Mandriota, M. Nitti, N. Ancona, E. Stella, A. Distante, Filter-based feature selection for rail defect detection. Mach. Vis. Appl. 15(4), 179–185 (2004) [Online]. Available: http://doi.org/10. 1007/s00138-004-0148-3 6. J. Binney, A. Krause, G. Sukhatme, Informative path planning for an autonomous underwater vehicle, in 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 4791– 4796 (2010)
Chapter 55
Fake Account Detection in Social Media Using Big Data Analytics Shaik Mujeeb and Sangeeta Gupta
1 Introduction Online social media has modified the planet by increasing the social media users drastically. The ease of communication between individuals is the main advantage of online social media [1]. In social media, the registration process is extremely simple due to which it attracts many users and eventually results in fake profile creation. These fake accounts are mostly created to interact with the user or victims to cause malicious attacks, spreading false news and spam messaging. This leads to potential attacks, like fake identity or profiles, uncontrolled misinformation spread or false information. The present era uses social media as a medium through which they can communicate with each other, circulate news, organize events and even run their e-commerce business. This rapid growth in the usage of social media can attract attackers and attempts to steal and misuse the personal data and spread false news and activities. Figure 1 shows the increase of pretended accounts over the time in online social media. Detection of fake profiles on big data multimedia like Twitter could be a tedious task. The information present in Twitter consists of datasets like text, audio and video which is generated from different users of social media. The data in Twitter big data multimedia include heterogeneous, geolocated and human-centric information and increases with time and has more media-related contents with huge volumes [2]. This system solves the fake account detection problem in big data platforms like Facebook, Twitter and Instagram. The challenges are misinformation spread where fake information is spread by pretended accounts and make difficulties in the detection of genuine information. Rumors, scams and influence bots are some
S. Mujeeb (B) · S. Gupta Computer Science Engineering, Chaitanya Bharathi Institute of Technology(A), Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4_57
587
588
S. Mujeeb and S. Gupta
Fig. 1 Fake accounts rate
examples of pretended or fake accounts users or creators to spread false information [3].
2 Literature Survey Today, fake accounts became a very serious threat in online social networks. So, the identification method is extremely compulsory to look out for these fake accounts. Many authors have worked toward this area and have proposed methods to detect these forms of accounts in online social media. Some of those methods are discussed below. Pakaya and Ibrohim [4]. They had proposed the machine learning process of the “information acquisition, data cleaning, feature extraction, modeling, and evaluation. Modeling using four algorithms, Logistic Regression, ADA Boost, XGBoost” [5], and random forest. The accuracy given by the algorithms like logistic regression and AdaBoost are 88.5, 87.1, 89.1 and 90.8, respectively. It has been noticed that identification of pretended accounts is quite good when provided with all the inputs and correct inputs. The prediction of the algorithm decreases when some inputs are missing. The results of the analyses and their difference in the result of above different algorithms shows that random forest performs better with the mean accuracy of 90.8. Whereas, here in proposed work, the gradient boosting algorithm hit the accuracy of 97% which is quite efficient when compared to the accuracy result of the “Random Forest” algorithm of proposed work. Singh et al. [6]. They had proposed the model to differentiate between normal and bot accounts consisting of the different features in the recorded data and also by distinguishing the bot accounts completed by the recurrence of the messages and types of message at particular or same timeline of the day. The differentiation of real and not-so-genuine accounts is done by easily comparing them with the datasets of fake accounts and classifying them as pretended accounts and the dataset of real
55 Fake Account Detection in Social Media Using Big Data Analytics
589
accounts and classifying them as genuine accounts. After training the algorithm, it is successfully able to predict which account is pretended and which is genuine. Aydin et al. [7]. They proposed the ML-based methods which uses “Decision trees, logistic regression and support vector machines algorithms” to identify and detect the pretended accounts and the genuine accounts. Firstly, the dataset consisting pretended and genuine accounts data is preprocessed and passed to the above three machine learning algorithms, and they finally determine the pretended accounts. The precision of algorithms are, logistic regression (LR): 0.8, decision tree (DT): 0.75 and SVM: 0.74, and recall of algorithms are, LR: 0.85, DT: 0.82 and SVM: 0.64. The logistic regression performs efficiently when compared to the performances of other algorithms. Metrics evaluated and the highest performance with LR algorithm has been obtained as 0.821 it has been obtained from the algorithm DT respectively 0.794 and 0.704 for SVM. Sarah Khaled, Neamat eltazi, Hoda M. O. Mokhtar. They proposed a “hybrid classification algorithm” [8] which consists of a “Neural Network” (NN) and “Support vector machine” (SVM), where the NN runs on the value resulting from SVM. This algorithm uses less number of features, where it is still able to classify about 98% of the accounts properly from the training dataset. Additionally, the performance of the classifiers toward the datasets of real and pretended accounts is also validated. The accuracy of algorithms are 98.3% obtained from SVM-NN, 92.3% obtained from SVM, 88.2% obtained from NN; it encountered extremely low accuracy in classification by using the feature sets obtained from PCA, where the feature sets resulted by correlation provide high accuracy classification. On analyzing the result, it was shown that “SVM-NN” achieved higher classification accuracy when compared to the other two classifiers which get around 98% classification accuracy with all feature sets compared.
3 Proposed System In the proposed system, we have used the “Gradient boosting algorithm” to overcome the complexities faced by the existing system which uses a “random forest” classification algorithm for the identification and detection of “fake accounts.” The algorithm performs well when it has passed with all the correct input sets. It was seen that the efficiency of the algorithm decreases if the number of inputs are missing and if passing the wrong inputs. Both the “Gradient boosting algorithm” and the “random forest algorithm” uses the value obtained from the “decision trees,” so it is the main component for obtaining the values from datasets. For the identification of pretended accounts, some new methods are introduced which are commenting spam, functional rate and influenced bots. A decision tree is formed by using the above inputs, and finally, these trees are passed as an input to the “Gradient boosting algorithm” and give us the output. The reason for selecting
590
S. Mujeeb and S. Gupta
these algorithms is it provides outputs even if some inputs are missing in an efficient manner.
3.1 Commenting Spams Here, we are going to identify the social media accounts used by bots for spam commenting and spreading false information or misinformation spread. In our work, the comments are said to be spam if it matches with the below scenarios. The comment made by an account, especially a bot account, is more over a clock when compared to the comment made by a normal social media account user. The generated comments from the social media account will be examined in an efficient manner, where the comparison of the comments are done with the average comment of a genuine user from online social media. The account can be considered as a fake or pretended account if it gets a big comparative difference, else then it is a comment posted by a genuine user of social media.
3.2 Functioning Rate Calculation The interaction of a social media user with the posts of social media can be called as functioning rate, and it can be calculated as is the percentage by which the people interacting with a post on media. The functional rate can be evaluated by observing the number of interactions between the followers. A functioning rate is calculated in the form of a metric which consists of the degree of activities such as post uploaded on social media platforms. The comments, likes and shares are some kind of social media interactions. The most pretended accounts consist of a minimum number of likes even having 1000s of followers, so the functional rate matrix plays a vital role in the identification of pretended accounts. Suppose a Twitter account consist of thousands of followers but he did not get any response (likes, share, retweet) on his post, then that account can be considered as a fake or pretended account Functioning rate =
total number of reaction total number of followers)
× 100
55 Fake Account Detection in Social Media Using Big Data Analytics
591
3.3 Influence Bots The activities like sharing, liking and commenting are the normal activities; if the frequency of such activities increases tediously that it turn into an artificial activity which cannot be done by a normal social media user, bot accounts can do such artificial activities [1]. The activities where the account is functional over the clock, i.e., 24 × 7, then such activities can be performed by a bot. Here, we identify the number of activities like commenting, likes and sharing which are done by the user since the time of its creation. If we find an enormous (number which cannot be achieved by an average social media user) difference when compared to normal accounts, then it is considered as a bot account. Insufficient account information and the verification status of the accounts such as email and mobile numbers are some other factors to be considered before concluding.
3.4 Identifying Fake Accounts The data extracted from social media accounts are combined [9]. Here, we mainly focus on functioning rate, influence bots and identification of rumors. The value of the mentioned factors are computed using the collected data, and these values are used to construct different decision trees. Finally, the gradient boosting algorithm uses the above tree to identify and detect the pretended accounts. Decision Trees Decision trees are made by considering the three nodes: functioning rate, influence bots and identification of rumors. Initially, we start creating a tree by making functioning rate as a root and remaining two nodes as children nodes; similarly, we create second decision tree having influence bot as root node and remaining subsequent node as functioning rate and identification of rumors, and finally, we create a decision tree which consist of identification of rumors as a root node and remaining nodes as child nodes. Gradient Boosting Algorithm The main principle of the “gradient boosting algorithm” is that from multiple weak learners, it forms a strong rule; in our case, we have multiple weak decision trees. It comes under the most efficient algorithm for the data classification-related problems. The “gradient boosting algorithms” work effectively with a lot of training data along with accurate values. This algorithm predicts efficiently even if the above-mentioned values are missing.
592
S. Mujeeb and S. Gupta
4 Experiments and Results In this work, the datasets are collected from GitHub which are used in the experiments to evaluate the results and performance of the gradient boosting algorithm. The datasets consist of fake and genuine user accounts data dealing with Twitter social media platform. The genuine dataset consists of data related to people who want to contribute toward the academic research for identifying fake profiles [7]. The fake accounts were taken from the fastfollowerz.com. The expected classification estimates are assigned to one of four evaluations which are “True positives (TP).” “True negatives (TN).” “False positives (FP).” “False negatives (FN).” On applying the gradient boosting algorithm to the datasets. The resulting confusion matrix with and without normalization is shown below. Here, we have 808 human or genuine Twitter accounts, where our algorithm predicts 798 as genuine (TP) and 10 as fake (FP); similarly, out of 883 fake accounts, it predicts 843 as fake (TN) and 40 as genuine (FN). It achieves 97% hit rate as shown in the above Fig. 4. We can see the prediction count of TP, FP, TN, FN of fake and genuine accounts in Fig. 2, and Fig. 3 is a normalized confusion matrix in which we can see the 98.7% TP rate and 95.4% TN rate. Figure 4 shows the 0.99 precision, 0.95 recall and f 1 score is 0.97. Fig. 2 Confusion matrix
55 Fake Account Detection in Social Media Using Big Data Analytics
593
Fig. 3 Normalized confusion matrix
Fig. 4 Classification report
We can see YouTube true positive and false positive rate in the ROC curve Fig. 5 which is 0.98 and 0.04. Table 1 is the result table which is generated from the gradient boosting algorithm implemented using big data analytics. Here, we can see the scaled features of the model and prediction of the model, where 0 represents a genuine account and 1 represents a fake account.
5 Comparative Analysis In this comparative analysis, the three machine learning algorithms are compared with the py spark big data analytics algorithm which is shown in (Fig. 6 and Fig. 7). On comparing, it was seen in Tables 2 and 3 that the accuracy of gradient boosting classifier of machine learning is 97% which is greater than the random forest classifier which get a accuracy of 93%. This is seen due to the number of feature sets the gradient boosting algorithm performs well with less number of features whereas random forest need all and correct features.
594
S. Mujeeb and S. Gupta
Fig. 5 ROC curve Table 1 Result Scaled_features
IsGeunine RawPrediction
Probability
Prediction
(7, [0, 2, 6], [2.047…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [4.094…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [4.094…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [4.094…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [4.094…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [4.094…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [4.094…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [6.141…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [8.188…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [8.188…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [8.188…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [8.188…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [8.188…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [0.001…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [0, 2, 6], [0.003…
0.0
[1.54350200272498…
(7, [1, 2, 6], [1.246…
1.0
[−1.5435020027249… [0.04364652142729… 1.0
(7, [2, 6], [0.06871…
[0.95635347857270… 0.0
1.0
[−1.5435020027249… [0.04364652142729… 1.0
[2.04722155743281… 1.0
[−1.5435020027249… [0.04364652142729… 1.0
[2.04722155743281… 1.0
[−1.5435020027249… [0.04364652142729… 1.0
[2.04722155743281… 1.0
[−1.5435020027249… [0.04364652142729… 1.0
Only showing top 20 rows
55 Fake Account Detection in Social Media Using Big Data Analytics
595
Fig. 6 Machine learning accuracy
Fig. 7 Big data analytics accuracy
Table 2 Machine learning accuracy
Models
Accuracy
0
Gradient boosting
0.970432
1
Random forest
0.931402
2
Decision tree
0.903016
Models
Accuracy
0
Gradient boosting
0.995241
1
Random forest
0.994051
2
Decision tree
0.988697
Table 3 Big data analytics accuracy
Decision trees get the least accuracy of 90%. Similarly, on comparing the accuracy of gradient boosting classifiers of big data analytics is 99.5% which is greater than the random forest classifier which gets an accuracy of 99.4% and decision tree getting the least accuracy as 98.8 which is very much efficient when compared to the machine learning algorithms.
596
S. Mujeeb and S. Gupta
6 Conclusion In this work, an innovative method to detect fake accounts has been developed by utilizing a variation of gradient boosting algorithm with a tree obtained by a decision tree which comprises a group of attributes. It will lead to improving overall efficiency and handle the scalability aspect because of the increasing number of social media users. The manual prediction of the pretended accounts has been eliminated which is a time-consuming task and involves a huge amount of human resource. Here, we used stable factors like spam commenting, functional rate and influence bots which in turn increases the accuracy of identification and detection of pretended accounts.
References 1. S. Gupta, N. Kumari, Security mechanism for twitter data using Cassandra in cloud. Int. J. Distrib. Cloud Comput. 7(2) (2019). ISSN Number: 2321-6840 2. S. Gupta, R. Godavarti, IoT data management using cloud computing and big data technologies. Int. J. Softw. Innov. 8(4) (2020). http://doi.org/10.4018/IJSI.2020100104 3. O. Ahmad, S. Gupta, M. Hasibuddin, Truth discovery in big data social media sensing applications. Int. J. Innov. Technol. Exploring Eng. 9(8) (2020). http://doi.org/10.35940/ijitee.H6311. 069820 4. F.N. Pakaya, M.O. Ibrohim, Malicious account detection on twitter based on tweet account features using machine learning, in IEEE, International Conference on Informatics and Computing (ICIC) (2019) 5. S. Khaled, N. El-Tazi, H.M.O. Mokhtar, Detecting fake accounts on social media, in IEEE International Conference on Big Data (2018) 6. N. Singh, T. Sharma, A. Thakral, T. Choudhury, Detection of fake profile in online social networks using machine learning, in International Conference on Advances in Computing and Communication Engineering (ICACCE) (2018) 7. I. Aydin, M. Sevi, M.U. Salur, Detection of fake twitter accounts with machine learning algorithms, in International Artificial Intelligence and Data Processing Symposium (IDAP) (2018) 8. S. Khaled, N. El-Tazi, H.M. Mokhtar, Detecting fake accounts on social media, in IEEE International Conference on Big Data (2018) 9. S. Gupta, R. Aluvalu, Twitter based capital market analysis using cloud statistics. Int. J. Sociotechnol. Knowl. Dev. 11(2) (2019). http://doi.org/10.4018/IJSKD.2019040104
Author Index
A Agarwal, Anjali, 255 Akarsh, Chagantipati, 471 Alapati, Darsani, 287 Ambadipudi, Sai Raghu, 397 Arvind, C. S., 153 Ashish Reddy, Podduturi, 527 Azhar, Mohamad Asyraaf, 13
G Gajavelly, Kovid, 69 Gidwani, Himanshi, 461 Govindarajan, Sathya, 287 Greeshma, S., 491 Guda, Vanitha, 217 Guduri, Anvesh, 481 Gupta, Sangeeta, 587
B Bagadi, Lavanya, 317 Balaji, S., 49 Basha, Rayapati Mabu, 199 Bharath Kumar Chowdary, P., 555 Bommala, Harikrishna, 199 Brahmananda Reddy, A., 429
H Hamsini, R., 97 Haripriya, M., 97 Hemadri, Sai Srinivasa Preetham, 397 Hemanth, P., 317
C Chaitanya, Sandeep N., 207 Cheripelli, Ramesh, 371 Chirgaiya, Sachin, 461 Chowdary, Tumati Jahnavi, 555 D Danve, Riya, 241 Das, Ajanta, 255 Das, Roshni Rupali, 255 Das, Saneev Kumar, 307 Durisetti, Srinikhil, 287 F Faizabadi, Ahmed Rimaz, 21
I Imoh, Nsikak, 131 Iqbal, Asma, 409
J Jahnavi, Milar, 397, 429 Jahnavi, Pathuri, 555 Jaya Jones, K., 499
K Kalaiarasi, M., 141 Kalpana Khandale, B., 35 Karadi, Prathyusha, 207 Karnam, Akhil, 461 Karthik, Narra, 347 Kaushik, C., 527
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-7389-4
597
598 Kavya, Talluri, 109 Khalid, 21 Kiranmayee, B. V., 335, 385, 397 Kiran, V., 317 Kodati, Sarangam, 565 Kotha, Keerthana, 287 Kotha, Rithik, 217 Kousar Nikath, A., 69 Krishna, Tammali Sai, 359 Krishna, Yenreddy Vamshi, 207 Kumar, Keshav, 191, 575 Kumar, Siva P. V., 359 Kurapati, Surendra, 207
L Laxmi Deepthi, G., 499
M Madhunala, Shiva, 385 Mai, Kiran C., 451 Mallik, Moksud Alam, 13, 21, 491 Mandula, Vijoosh, 535 Manikiran, G., 491 Manjunatha Chari, K., 409 Mittra, Yash, 191 Mohan, M., 49 Movva, Rajitha Bhargavi, 117 Mujeeb, Shaik, 587 Murty, P. L. N., 89
N Nagaveni, V., 153 Naini, Swetha, 79 Namrata Mahender, C., 35
P Pachika, Shivani, 265, 275 Padmanabham, J., 89 Padmasree, Alagam, 109 Pampari, Roshan, 397 Panda, Shivani, 317 Patel, Chirag, 513 Patel, Namra, 513 Patil, Ketaki, 241 Patil, Ruchira, 241 Patil, Shivraj, 241 Patra, Prashanta Kumar, 299 Pavan Myana, 535 Pendyala, Vishnu, 1 Pillalamarri, Praveen, 317
Author Index Pooja, Hattarki, 171 Poornima, Murkute, 57 Pothanna, N., 359 Pranitha, Laxmi R., 419 Prasannanjaneyulu, A. N. K., 371 Priya, Oruganti Shashi, 545 Pulipati, Venkateswara Rao, 57 Pushyami, Bhagavathula, 347 Putchala, Sreshta R., 217
R Rajarao, B., 199 Raju, Bal M., 325 Raj, Vinay, 439 Rakshit, Sandip, 131 Ramadevi, Yellasiri, 217 Ramesh Chandra, G., 287 Ramesh, G. S., 535 Rani, Sudagani Sandhya, 555 Rao, Rajeswara R., 183 Rao, Tirumala S. N., 183 Reddy Anumasula, Rakshith, 69 Rohith, Ankathi, 207
S Sahithi, P., 97 Sahith Reddy, G., 527 Sahitya, G., 527 Sai Venunath, P., 231 Sameeksha, G., 429 Sandeep Chaitanya, N., 117 Sandhya, K., 429 Sangeetha, K., 199 Sanjana, Bonthala, 347 Santhoshi, Kukkadapu, 109 Sardar, Tanvir H., 491 Sashikant Sharma, Shree, 513 Sathvika Macha, 535 Sethi, Lingaraj, 299 Shetkar, Ambika, 451 Shruthi, U., 153 Sireesha, C., 231 Sneha, Balannolla, 335 Soma, Shridevi, 171 Sreehari Rao, Y., 491 Sreshta, Yedla Sai, 481 Srija, Kakollu, 555 Srinivas, Kanakala, 545 Srinivasa Kumar, T., 89 Srinivasa Reddy, Konda, 109, 439 Sri Surya, N., 231
Author Index Stalin Babu, G., 183 Sujatha, C. N., 97, 347 Sukheja, Deepak, 461 Sumathi, Thatta, 481 Sunil Kumar, Talluri, 57, 565 Sunil, G. L., 153 Sunitha, Lingam, 325 Suprith Reddy, 535 Surana, Rohan, 1 Suresh, Chalumuru, 335, 397 Susan, Talluri, 565 Susmitha, A. R., 307 Susmitha, M., 419 Suthar, Ved, 513
T Tamizhazhagan, V., 49 Tatineni, Dedeepya, 481 Thakkar, Sejal, 513 Thandekkattu, Salu George, 141
U Udaya Bhaskar, T. V. S., 89 Upender, Kaveti, 359
599 V Vaddeboyina Sri Manvith, 385 Vadnala, Sai Keerthi, 287 Vaishali Kadam, P., 35 Vaishnavi, P., 429 Vajjhala, Narasimha Rao, 131 Vamsi Manyam, 535 Varma, Sagi Harshad, 471 Varshney, Aakash, 1 Vasavi, R., 69, 499 Vasundhara, D. N., 79, 499 Venkata Sailaja, N., 79 Venkateswara Rao, P., 471 Vijaya Saraswathi, R., 69, 499 Vishnu Teja, P., 491
Y Yamini, C., 451 Yeruva, Sagar, 79, 117, 481, 545, 565 Yuan, Xiaobu, 265, 275
Z Zulkurnain, Nurul Fariza, 13