122 97 84MB
English Pages 693 [692] Year 2022
Lecture Notes in Networks and Systems 480
Asit Kumar Das · Janmenjoy Nayak · Bighnaraj Naik · S. Vimal · Danilo Pelusi Editors
Computational Intelligence in Pattern Recognition Proceedings of CIPR 2022
Lecture Notes in Networks and Systems Volume 480
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
More information about this series at https://link.springer.com/bookseries/15179
Asit Kumar Das Janmenjoy Nayak Bighnaraj Naik S. Vimal Danilo Pelusi •
•
•
•
Editors
Computational Intelligence in Pattern Recognition Proceedings of CIPR 2022
123
Editors Asit Kumar Das Department of Computer Science and Technology Indian Institute of Engineering Science and Technology Shibpur, West Bengal, India Bighnaraj Naik Department of Computer Application Veer Surendra Sai University of Technology Sambalpur, Odisha, India
Janmenjoy Nayak P. G. Department of Computer Science Maharaja Sriram Chandra Bhanja Deo (MSCB) University Baripada, Mayurbhanj, Odisha, India S. Vimal Department of Artificial Intelligence and Data Science Ramco Institute of Technology Rajapalayam, Tamil Nadu, India
Danilo Pelusi Communication Sciences University of Teramo Teramo, Italy
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-3088-1 ISBN 978-981-19-3089-8 (eBook) https://doi.org/10.1007/978-981-19-3089-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
Computational intelligence (CI) is an advancing field, and it incorporates various trending technologies like fuzzy logic and advances of fuzzy system, higher-order neural networks, evolutionary intelligence, swarm and memetic computing, deep learning and hybrid models social reasoning, and artificial hormone networks. CI assumes a significant part in creating effective intelligent systems, including games and mental formative frameworks. In the course of the most recent couple of years, there has been a fundamental necessity of intelligent computing in various areas of science and advancements because of the impeccable development in information. Improvement in science and development is real to raise the inward nature of the country. These days, deep learning (DL), neural networks (NN), fuzzy logic (FL), evolutionary algorithms (GA), and numerous other CI strategies have become eye-getting and regular decisions for researchers. Also, these procedures are proper for making efficient models of processes with the incorporation of other insightful techniques for successful example acknowledgment issues. It is our pleasure to welcome you to the 4th International Conference on Computational Intelligence in Pattern Recognition (CIPR) at IIEST, Kolkata, West Bengal, India, on April 23rd and 24th, 2022. A significant objective and component of it are to bring scholastic researchers, engineers, and industry experts together to share and exchange their encounters and examination results about most parts of science and social exploration and examine the useful difficulties experienced and the solutions adopted. The conference is coordinated toward the dispersal of leading scholastic researchers and analysts to share their research results and experiences on all parts of pattern recognition as well as intelligent computing. It likewise presents a major interdisciplinary stage for scientists, specialists, and educators to introduce and inspect the most recent progressions, patterns, concerns, and reasonable difficulties experienced and arrangements taken in the fields of pattern recognition. This proceeding means to give quick dispersal of significant outcomes and undeniable level ideas latest domains of intelligent computing, deep learning, soft computing, etc. It is an incredible honor for us to introduce the procedures of CIPR 2022 to the creators and agents of the occasion. We trust that you will think that it is helpful, invigorating, and inspiring. CIPR 2022 promises to be both invigorating and v
vi
Preface
educational with a magnificent cluster of featured discussions and welcomed speakers from everywhere across the globe. CIPR tries to give a stage to examine the issues, troubles, open doors, and revelations of computational insight and acknowledgment research. The steadily changing extension and fast advancement of intelligent techniques raise new issues and doubts, achieving the certified prerequisites for sharing splendid inspiring considerations and vitalizing incredible thoughtfulness regarding this huge exploration field. We assure to convey an enchanting scene for design acknowledgment, while the support received and the excitement saw have genuinely outperformed our presumptions. In this fourth form, a different topic of proposals is welcomed on uses of computational experiences in consolidation to text video identification, opinion examination, and advanced image processing. For just over a year of the COVID-19 pandemic, the world’s research output was devoted to the coronavirus in 2020. Major scientists and researchers have made a wide range of research by using COVID data analysis. We have highlighted a few COVID-related papers to assist readers to keep up with the flood of coronavirus research also. This series of CIPR holds classified articles considering the major and minor thematic areas of the conference, which are incredibly achieved in the specific area. This volume is a wide extent of combinations of articles on utilization of computational intelligence prediction, stock exchange analysis, real-time video analysis, text recognition, explicit language recognition, COVID-19 report assessment, fingerprint analysis, patient analysis and monitoring, concrete crack recognition, cancer analysis and detection, student lifestyle query categorization, etc. Everyone of the accepted papers is double-blind reviewed by the concerned subject specialists and the editors lived it up working as a team with international advisory, program, as well as technical committee members. CIPR 2022 comprises 61 selected high-quality papers that were submitted to the conferences and peer reviewed by the committee and international review members. This meeting turned into a stage to share the information space amid various nations’ examination societies. The accepted papers (both research and review) have been alienated to include the cutting-edge spotlight of computational intelligent techniques in pattern recognition. We expect the author’s review and derivations to incorporate worth to it. Principal and foremost are the authors, sections, and publications whose endeavors have processed the conference an immense achievement. The CIPR gathering is an acknowledgment to a huge assortment of individuals and everybody should feel glad for the outcome. CIPR 2022 presents illuminating commitments for research scholars all through the world in research fields of novel and inventive strategies and with state-of-the-art procedures as well as applications. This conference could never have been coordinated without the help and strong support from committee members of CIPR 2022. On behalf of the organizing committee, we wish to communicate our genuine appreciation to the keynote and panel speakers. Beginning from the call for papers till the finish of parts, all the colleagues are given their commitments genially, which is a hopeful indication of huge group work. We expand our profound feeling of appreciation to every one of the prospective authors, organizing committee members, International
Preface
vii
and national advisory board members, technical committee members, and reviewers for their genuine help to make this occasion effective. Also, we are appreciative of the help and collaboration of the Springer specialized group for the timeline creation of this volume. At last, we want to wish you have progressed in your presentations and interpersonal interaction. Your strong supports are basic to the outcome of this conference. Wishing you a fruitful and enjoyable CIPR 2022!!! Asit Kumar Das Janmenjoy Nayak Bighnaraj Naik S. Vimal Danilo Pelusi
Acknowledgment
It is our incredible pleasure to introduce this volume comprising of chosen papers in view of computational intelligence in pattern recognition (CIPR). After the fruitful three forms of CIPR, the fourth form had faith in greater quality parts of computational insight-based research and advancements. This version pulled in a few researchers or academicians throughout the world to pick this venue for presenting the articles and give a broad height to the standing of the CIPR 2022 meeting for research discoveries and dividing the information between public and worldwide specialists. The program comprises welcomed meetings and technical workshops and conversations with prominent speakers covering a wide scope of subjects in science and social exploration. This rich program gives all participants the chance to meet and associate with each other. We trust your involvement in CIPR 2022 is a productive and durable one. With your help and cooperation, the meeting will proceed with its prosperity for quite a while. The vital proposition and centrality of the CIPR gathering have been interested more than 200 academicians, specialists, or analysts generally through the world, urged as to incline toward better quality papers and propose to show the standing of the CIPR meeting for unique examination disclosures, the pattern of contemplations and imparting of data to both national as well as international cooperatives in different angles and fields of information investigation and pattern recognition. Heartily thanks to everyone who has presented their prospective research in CIPR. The overall organizing committee wishes to recognize the help and consolation that we have got from our associations and the many others, who arranged this event. We wanted to make a move to thank our authors whose significant research discoveries made this occasion excellent, welcomed speakers, presenters, and audiences as well. Likewise, we want to stretch out our appreciation to the reviewers for their definitive help and having so liberally shared their time and skill also. We are particularly appreciative to the organizing team of CIPR from IIEST, Kolkata, for their tremendous help in everyone of the structures to get this global occasion going effectively. We wish to communicate our appreciation and true appreciation to our dearest director for his consistent support and direction during the lead of the CIPR 2022 Conference. We like to thank our dean, for their ix
x
Acknowledgment
definitive support and were already been there for us whenever needed. Alongside these people, we wish to thank our department staff individuals who contributed enormously to the association and progress of the conference. We have been sufficiently lucky to work with splendid national and international advisory, technical, and program committee members. From the beginning day of this occasion, the individuals from the program and specialized panel were enthusiastic about the quality results, and their ideas have made all potential ways of shifting good quality articles through all submitted papers. We might want to pass our generous thanks to the reviewers. They have buckled down in reviewing papers and making important ideas for the authors to improve their work. We additionally want to offer our thanks to the external reviewers, for giving additional assistance in the audit interaction, and the authors for contributing their research results to the conference. We are grateful to the editorial members of Springer publishing procure a special declare and our significant legitimate appreciation to them not just forget together our specter worked out through continuing yet additionally for its iridescent on schedule and outfit distributions in elegant, development-wise plan and advances, Springer. We likewise owe our appreciation to the editors of this journal for their eagerness to manage the papers of these procedures we are especially appreciative to the individuals from the program and organizing committee for all their persistent effort with the readiness of the gathering procedures all through the occasion. Last yet not least, the CIPR gathering and procedures are acknowledgments to a massive collection of individuals and all should be advance of the result.
Organization
Chief Patron Parthasarathi Chakrabarti (Director)
Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India
Patrons Sudip Kumar Roy (Dean–Academics) Hafizur Rahaman (Dean-Research and Consultancy) Prasanta Kumar Nandi (Dean-Faculty Welfare) Debabrata Mazumder (Dean-Student Welfare)
IIEST Shibpur IIEST Shibpur
IIEST Shibpur IIEST Shibpur
Honorary Advisory Chairs Lakshmi C. Jain Michael Pecht (Chair Professor and Director) V. E. Balas Pabitra Mitra Ashish Ghosh
University of Canberra, Australia University of Maryland, College Park, USA
AurelVlaicu University of Arad, Romania Indian Institute of Technology, Kharagpur, West Bengal, India Indian Statistical Institute, Kolkata, West Bengal, India
xi
xii
Organization
Honorary General Chairs David Al-Dabass Jaya Sil
Susanta Chakraborty
Paramartha Dutta
Nottingham Trent University, UK Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India Visva-Bharati, Santiniketan, West Bengal, India
General Chairs Asit Kumar Das
Saroj K. Meher Weiping Ding Tanmay De
Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India Indian Statistical Institute, Bangalore Center Nantong University, Jiangsu, China National Institute of Technology, Durgapur, West Bengal, India
Program Chairs Janmenjoy Nayak
Danilo Pelusi S. Vimal Bighnaraj Naik
Maharaja Sriram Chandra Bhanja Deo (MSCB) University Baripada, Mayurbhanj, Odisha, India University of Teramo, CosteSant’agostino Campus, Teramo, Italy Ramco Institute of Technology, Tamil Nadu, India Veer Surendra Sai University of Technology, Burla, Odisha, India
Co-program Chairs Apurba Sarkar
Surajeet Ghosh
Santanu Phadikar Soumi Dutta
Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India Maulana Abul Kalam Azad University of Technology, West Bengal, India IEM, West Bengal, India
Organization
xiii
Organizing Chairs Samit Biswas
Ashish Kumar Layek
Tamal Pal
Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India
International Advisory Committee Pabitra Mitra Florin PopentiuVladicescu Arijit Sur Charlie (Seungmin) Rho Shaikh A. Fattah Claude Delpha Sheng-Lung Peng Jong Hyuk Park Mohammad S. Khan Susanta Chakraborty Rubén González Crespo Swagatam Das Raffaele Mascella Sulata Mitra Prabhat Kumar Debdatta Sinha K. C. Santosh Luca Tallini Haffizur Rahman Atanu Bhattacharjee Shahid Mumtaz Monojit Mitra Yong Deng Rashmi Gupta Amir H. Gandomi Mita Nasipuri
IIT, Kharagpur, India University of Oradea, Romania IIT Guwahati, India Chung-Ang University, Seoul, Korea Bangladesh University of Engineering and Technology, Bangladesh Université Paris Saclay, France National Dong Hwa University, Hualian, Taiwan Seoul National University of Science and Technology, Korea East Tennessee State University, USA IIEST Shibpur, Howrah, W.B., India Universidad Internacional de La Rioja Indian Statistical Institute, Kolkata, India University of Teramo, Italy IIEST, Shibpur, W.B., India National Institute of Technology Patna, India University of Calcutta, India University of South Dakota University of Teramo, Italy IIEST Shibpur, W.B., India HomiBhaba National Institute Section of Biostatistics, India Instituto de Telecomunicaces, Aveiro, Portugal IIEST Shibpur, W.B. Institute of Fundamental and Frontier Science Chengdu, China Netaji Subhas University of Technology, East Campus, India University of Technology Sydney, Australia Jadavpur University, West Bengal, India
xiv
Xiao-Zhi Gao Santi Prasad Maity Paramartha Dutta Alireza Souri Robert Bestak Qin Xin Govindarajan Kannan Manju Khari Naveen Chilamkurti Gajendra K. Vishwakarma Joy Iong-Zong Chen Amitava Chatterjee Subramaniam Ganesan D. P. Mohapatra Ahmed A. Elngar Damien Sauveron Ali Kashif Bashir B. Annappa Victor Hugo C. de Albuquerque Chandan Kumar Chanda Dac-Nhuong Le Mamoun Alazab H. S. Behera Daniel Burgos Seifedine Kadry Y. Harold Robinson J. K. Mandal Xuan Liu M. Kaliappan Swapnoneel Roy
Organization
University of Eastern Finland, Kuopio, Finland IIEST Shibpur, W.B., India VisvaBharati University, W.B., India Islamic Azad University: Sardroud, IR Czech Technical University in Prague, Czech Republic University of the Faroe Islands, Denmark Indiana University Bloomington, Bloomington JNU, New Delhi, India La Trobe University, Melbourne, Australia IIT (ISM) Dhanbad, India Da-Yeh University, Taiwan Jadavpur University, West Bengal Oakland University, USA NIT Rourkela, India Beni-Suef University, Egypt Université de Limoges, France Manchester Metropolitan University, UK NIT Surathkal, Karnataka University of Fortaleza, Brazil IIEST Shibpur, W.B. Haiphong University, Haiphong, Vietnam Charles Darwin University, Australia VSSUT, Burla, Odisha, India International University of La Rioja (UNIR), Spain Beirut Arab University, Lebanon VIT University, India University of Kalyani, West Bengal, India Future Network Research Center, Southeast University, China Ramco Institute of Technology, India University of North Florida, USA
Technical Committee Joy Iong-Zong Chen S. K. Hafizul Islam Uttam Ghosh Ananya Barui Ahmed Elngar
Da-Yeh University, Taiwan IIIT, Kalyani Vanderbilt University, USA Center of Healthcare Science and Technology, IIEST, Shibpur, India Faculty of Computers and Artificial intelligence, Beni-Suef University, Egypt
Organization
P. Subbulakshmi U. D. Prasan Tanmay De Dac-Nhuong Le G. T. Chandra Sekhar Noor Zaman Rajendrani Mukherjee Irfan Mehmood L. Ganesan Gaurav Dhiman Arif Sari Ram Sarkar Pradeepa Xiao-ZhiGao Soumya Ranjan Nayak Khan Muhammad Vijay Bhaskar Semwal Hoang Viet Long Suparna Biswas (Saha) Surajeet Ghosh Carla M. A. Pinto Nibaran Das J. C. Bansal Ramani Kannan Samit Biswas Nevine Makram Labib J. V. Anchitaalagammai Imon Mukherjee Sarat Chandra Nayak A. Suresh A. R. Routray Alex Khang Chitrangada Das Mukhopadhyay
xv
VIT University, Chennai, India Aditya Institute of Technology and Management, Tekkali, A. P. National Institute of Technology, Durgapur, West Bengal, India Haiphong University, Haiphong, Vietnam Sri Sivani College of Engineering, Srikakulam, Andhra Pradesh, India Taylor’s University, Malaysia University of Engineering and management, Kolkata, West Bengal University of Bradford, UK Ramco Institute of Technology, India Government Bikram College of Commerce, Patiala, India Girne American University, UK Jadavpur University, W.B., India Sastra University, India University of Eastern Finland, Kuopio, Finland Amity University, Noida, UP, India Sejong University, Seoul MANIT Bhopal, India People’s Police University of Technology and Logistics, Bac Ninh, Vietnam MAKAUT, WB IIEST, Shibpur, Howrah, India Adjunct Professor at ISEP-Instituto Superior de Engenharia do Porto, Portugal Jadavpur University, W.B., India South Asian University, New Delhi, India Universiti Teknologi PETRONAS IIEST, Shibpur, Howrah, India Sadat Academy for Management Sciences, Egypt Velammal College of Engineering and Technology, India Indian Institute of Information Technology, Kalyani, West Bengal CMR College of Engineering and Technology, Hyderabad, India SRM university Chennai, India F. M. University, Odisha, India Information Technology, Leading Expert of Data Engineering, SEFIX, Vietnam Center of Healthcare Science and Technology, IIEST, Shibpur
xvi
Santosh Sahoo Sudhakar Ilango Oishila Bandyopadhyay Thinagaran Perumal V. Jackins B. Acharya Golden Julie Jeyabalaraja D. Vijay Kumar Dac-Nhuong Le L. Jerart Julus Ronnie Figueiredo
Organization
CVR College of Engg., Hyderabad, India VIT University Andhra Pradesh, India Indian Institute of Information Technology, Kalyani, West Bengal Universiti Putra Malaysia, Malaysia National Engineering College, India National Institute of Technology Raipur, India Anna University Tirunelveli, India Velammal Engineering College, India University College of Engineering Tindivanam, India Faculty of Information Technology, Haiphong University, Vietnam National Engineering College, India Universidade da Beira Interior-UBI, Portugal
Publicity Chairs Sunanda Das Debdutta Pal
Ghazaala Yasmin
Chandan Giri
Department of Computer Science and Engineering, Jain University, Bangalore, India Department of Computer Science and Engineering, Brainware University, Barasat, West Bengal, India Department of Computer Science and Engineering, St. Thomas College of Engineering and Technology, Kolkata, India Department of Information Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, W.B., India
Convenors Malay Kule
Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India
Co-convenors Nirnay Ghosh
Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India
Organization
xvii
Publication Chairs Soumen Kumar Pati Shampa Sengupta Ranjit Ghoshal
Maulana Abul Kalam Azad University of Technology, West Bengal, India MCKV Institute of Engineering, Howrah, West Bengal, India St. Thomas College of Engineering and Technology, Kolkata, India
Finance Chairs Asit Kumar Das
Apurba Sarkar
Malay Kule
Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India
CIPR Reviewers K. Sumathi K. V. Uma P. V. Siva Kumar Sourav Das Poly Ghosh Dilip Kumar Dalei Bhaveshkumar C. Dharmani Pooja M. R. N. R. Deepak Vijay Anant Athavale S. Sundeep P. M. K. Prasad Hiral M. Patel Ripal D. Ranpara
The American College (autonomous), Madurai Thiagarajar College of Engineering, Madurai VNR VJIET, Hyderabad Future Institute of Technology, Kolkata Primeasia University, Banani, Dhaka, Bangladesh Scientist, Defence Research, and Development Organization (DRDO), Bengaluru Lovely Professional University (LPU), Punjab Vidyavardhaka College of Engineering, Mysuru HKBK College of Engineering, Bangalore Panipat Institute of Engineering and Technology, Haryana Vignana Bharathi Institute of Technology, Hyderabad GVP College of Engineering for Women, Visakhapatnam Sankalchand Patel College of Engineering, Gujarat Atmiya University, Gujarat
xviii
El mostafa Bourhim
Mainak Bandyopadhyay S. Rama Sree A. Azhagu JaiSudhan Pazhani B. Kameswar Rao Karun Kumar Reddy Soumya Ranjan Nayak Byomokesh Das Bighnaraj Naik Himansu Das Manohar Mishra S. Vimal Sarat Ch Nayak Asanta R. Routray Ram Barik Sourav Kumar Bhoi P. Suresh Kumar Uma Ghugar Motahar Reza G. T. Chandra Sekhar H. Swapnarekha Asit Kumar Das Sunanda Das B. V. S. Acharyulu V. Jackins Kaliappan Sumathi Pradeepa S. Sudhakar Ilango Vignesh Saravanan
Organization
EMISYS: Energetic Mechanic and Industrial Systems Engineering 3S Research Center, Mohammadia School of Engineers, Mohamed V University, Industrial Engineering Department, Rabat, Morocco KIIT Deemed to be University, Bhubaneswar Aditya Engineering College, Surampalem Ramco Institute of Technology, Tamil Nadu Aditya Institute of Technology and Management, Tekkali Lankapalli Bullayya College of Engineering, Visakhapatnam, Andhra Pradesh Amity University, Noida Aditya Institute of Technology and Management, Tekkali Veer Surendra Sai University of Technology, Burla KIIT University, Bhubaneswar SOA University, Bhubaneswar Ramco Institute of Technology, Tamil Nadu CMR College of Engineering and Technology, Hyderabad, India F M University, Balasore, Odisha Vikash Institute of Technology, Bargarh Parala Maharaja Engineering College, Berhampur Aditya Institute of Technology and Management, Tekkali GITAM University, Visakhapatnam GITAM Deemed to be University, Hyderabad, India Sri Sivani Institute of Technology, Srikakulam Aditya Institute of Technology and Management, Tekkali IIEST, Shibpur Jain University, Bangalore Lendi Institute of Engineering and Technology, Andhra Pradesh National Engineering College, Tamil Nadu Ramco Institute of Technology, Tamil Nadu Sri Ramakrishna Institute of Technology, Tamil Nadu SAASTRA University, Tamil Nadu VIT-AP Amaravati, Andhra Pradesh Ramco institute of Technology, Tamil Nadu
Organization
Jaisudhan Y. Harold Robinson Subhasree Mohapatra Sharmila Subudhi Meenaskhi Memoria Madhurima Hooda Abhishek Sethy Sasanka Gantayat Sankhadeep Chaterjee Ranit Kumar Dey Diptendu Sinha Roy Rabindra K. Barik Alok Chakrabarty Akhilendra Pratap Singh Sanjoy Pratihar K. Hemant Reddy Buddhadeb Pradhan Jatindra Kumar Dash Tapas Kumar Mishra
xix
Ramco Institute of Technology, Tamil Nadu Vellore Institute of Technology, Vellore ITER, SOA University, Bhubaneswar, Odisha Maharaja Sriram Chandra Bhanja Deo University, Baripada, Odisha Uttrancchal University, Uttrancchal University of Swansea, UK GMR Institute of Technology, Andhra Pradesh GMR Institute of Technology, Andhra Pradesh IEM, Kolkata IIEST, Shibpur NIT, Meghalaya Kalinga Institute of Industrial Technology, Bhubaneswar NIT, Meghalaya NIT, Meghalaya IIIT, Kalyani GITAM University, Visakhapatnam, AP NIT, Jamshedpur SRM University, Andhra Pradesh SRM University, Andhra Pradesh
Contents
COVID-19 Detection Using Deep Learning: A Comparative Study of Segmentation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pranchal Sihare, Azeem Ullah Khan, Poritosh Bardhan, and B. K. Tripathy
1
Time Series Analysis on Covid 19 Summarized Twitter Data Using Modified TextRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ajit Kumar Das, Kushagra Chitkara, and Apurba Sarkar
11
Low-Computation IoT System Framework for Face Recognition Using Deep Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jayanta Paul, Rajat Subhra Bhowmick, and Jaya Sil
24
Vehicle Number Plate Recognition System . . . . . . . . . . . . . . . . . . . . . . . Ibidun Christiana Obagbuwa, Vincent Mohale Zibi, and Mishi Makade
36
An Approach to Medical Diagnosis Using Smart Chatbot . . . . . . . . . . . Shreya Verma, Mansi Singh, Ishita Tiwari, and B. K. Tripathy
43
Performance Analysis of Hybrid Filter Using PI and PI-Fuzzy Based UVTG Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Pavankumar and Saroj Pradhan
57
Transmission of Aggregated Data in LOADng-Based IoT Networks . . . Sayeda Suaiba Anwar and Asaduzzaman
67
Deep Learning Based Facial Mask Detection Using Mobilenetv2 . . . . . . Arijit Goswami, Biswarup Bhattacharjee, Rahul Debnath, Ankita Sikder, and Sudipta Basu Pal
77
A Novel Approach to Detect Power Theft in a Distribution System Using Machine Learning and Artificial Intelligence . . . . . . . . . . . . . . . . Abhinandan De, Somesh Lahiri Chakravarty, Sayan Kar, Abhijnan Maiti, and Sanchari Chatterjee
90
xxi
xxii
Contents
Adversarial Surround Localization and Robust Obstacle Detection with Point Cloud Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Rapti Chaudhuri and Suman Deb Perceptive Analysis of Chronic Kidney Disease Data Through Conceptual Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 P. Antony Seba and J. V. Bibal Benifa Islanding Detection in Microgrid Using Decision Tree Pattern Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Shyamal Das and Abhinandan De Identification of Lung Cancer Nodules from CT Images Using 2D Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Sutrisna Anjoy, Paramita De, and Sekhar Mandal A Pixel Dependent Adaptive Gamma Correction Based Image Enhancement Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Satyajit Panigrahi, Abhinandan Roul, and Rajashree Dash Summarization of Comic Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Tanushree Das, Arpita Dutta, and Samit Biswas TextUnet: Text Segmentation Using U-net . . . . . . . . . . . . . . . . . . . . . . . 163 Awritrojit Banerjee, Ranjit Ghoshal, and Arijit Ghosal A Survey on Prediction of Heart Disease Using Machine Intelligence Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Farzana Begum and J. Arul Valan Predictive Analysis of Child’s Mental Health/Psychology During the COVID-19 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Sandipan Saha, Sandip Murmu, Surajit Manna, Bappaditya Chowdhury, and Nibaran Das Impact of Security in Blockchain Based Real Time Applications . . . . . . 193 Ankush Kumar Gaur and J. Arul Valan An Evaluative Review on Various Tele-Health Systems Proposed in COVID Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Tanima Bhowmik, Rohan Mojumder, Dibyendu Ghosh, and Indrajit Banerjee Efficient Scheduling Algorithm Based on Duty-Cycle for e-Health Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Tanima Bhowmik, Rohan Mojumder, Dibyendu Ghosh, and Indrajit Banerjee Image Splicing Detection Using Feature Based Machine Learning Methods and Deep Learning Mechanisms . . . . . . . . . . . . . . . . . . . . . . . 221 Debjit Das and Ruchira Naskar
Contents
xxiii
Audio Driven Artificial Video Face Synthesis Using GAN and Machine Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Arnab Kumar Das and Ruchira Naskar Design of an Elevator Traffic System Using MATLAB Simulation . . . . 245 Ibidun Christiana Obagbuwa and Morapedi Tshepang Duncan A Simple Strategy for Handling ‘NOT’ Can Improve the Performance of Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Ranit Kumar Dey and Asit Kumar Das Rule Based Classification Using Particle Swarm Optimization for Heart Disease Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Udita Basu, Shraya Majumdar, Shreyasee Dutta, Soumyajit Mullick, Sagnik Ganguly, and Priyanka Das A Deep Learning Based Approach to Measure Confidence for Virtual Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Ravi Kumar Rungta, Parth Jaiswal, and B. K. Tripathy A Commercial Banking Industry Resilience in the Case of Pandemic: An Impact Analysis Through ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Sweta Mishra, Shikta Singh, and Debabrata Singh Fractal Analysis of RGB Color Images . . . . . . . . . . . . . . . . . . . . . . . . . 304 Sukanta Kumar Das, Jibitesh Mishra, and Soumya Ranjan Nayak Deep Features for COVID-19 Detection: Performance Evaluation on Multiple Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Asifuzzaman Lasker, Mridul Ghosh, Sk Md Obaidullah, Chandan Chakraborty, and Kaushik Roy Issues, Challenges, and Possibilities in IoT and Cloud Computing . . . . . 326 Vinay Kumar Mishra, Rajeev Tripathi, Raj Gaurang Tiwari, Alok Misra, and Sandeep Kumar Yadav Agricultural Image Augmentation with Generative Adversarial Networks GANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Sayan De, Ishita Bhakta, Santanu Phadikar, and Koushik Majumder Thermal Image Augmentation with Generative Adversarial Network for Agricultural Disease Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Ishita Bhakta, Santanu Phadikar, and Koushik Majumder Learning Temporal Mobility Patterns to Improve QoS in Mobile Wireless Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Satyaki Roy, Bilas Chandra, Anubhav Anand, Preetam Ghosh, and Nirnay Ghosh
xxiv
Contents
Automatic Question Generation from Video . . . . . . . . . . . . . . . . . . . . . 366 T. Janani Priya, K. P. Sabari Priya, L. Raxxelyn Jenneyl, and K. V. Uma Features Selection for Vessel Extraction Inspired by Survival of the Fittest Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Sumit Mukherjee, Ranjit Ghoshal, and Bibhas Chandra Dhara A Proposed Federated Learning Model for Vaccination Tweets . . . . . . . 383 Medha Singh, Madhulika, and Shefali Bansal A New Reversible Data Hiding Scheme by Altering Interpolated Pixels Exploiting Neighbor Mean Interpolation (NMI) . . . . . . . . . . . . . . . . . . . 393 Manasi Jana, Shubhankar Joardar, and Biswapati Jana A Proposed Fuzzy Logic Model for Waste Water Treatment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Kanishk Srivastava, Chirag Handa, and Madhulika Deep Learning Based Identification of Three Exotic Carps . . . . . . . . . . 416 Arnab Banerjee, Roopsia Chakraborty, Samarendra Behra, Nagesh Talagunda Srinivasan, Debotosh Bhattacharjee, and Nibaran Das Single Image Fog Removal Using WLS Smoothing Filter Combining CLAHE with DWT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Soumya Chakraborty, Biswapati Jana, and Sharmistha Jana Smart Surveillance Video Monitoring for Home Intruder Detection Using Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Rishav Chakraborty, Shubham Kumar, Subhadip De, and Oishila Bandyopadhyay is-Entropy: A Novel Uncertainty Measure for Image Segmentation . . . . 448 Bhaveshkumar Choithram Dharmani AGC Based Market Modeling of Deregulated Power System Employing Electric Vehicles and Battery Energy Storage System . . . . . 458 Debdeep Saha, Rajesh Panda, and Bipul Kumar Talukdar Acute Lymphocytic Leukemia Classification Using Color and Geometry Based Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Sourav Chandra Mandal, Oishila Bandyopadhyay, and Sanjoy Pratihar Thermal Strain Resolution Improvement in Brillouin OTDR Based DTS System Using LWT-MPSO Technique . . . . . . . . . . . . . . . . . . . . . . 479 Ramji Tangudu and P. Sirish Kumar Wrapper Based Feature Selection Approach Using Black Widow Optimization Algorithm for Data Classification . . . . . . . . . . . . . . . . . . . 487 Himanshu Dutta, Mahendra Kumar Gourisaria, and Himansu Das
Contents
xxv
Multi-objective Optimization for Complex Trajectory Tracking of 6-DOF Robotic Arm Manipulators . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Bivash Chakraborty, Rajarshi Mukhopadhyay, and Paramita Chattopadhyay MANDS: Malicious Node Detection System for Sinkhole Attack in WSN Using DRI and Cross Check Method . . . . . . . . . . . . . . . . . . . . 511 Minakshi Sahu, Nilambar Sethi, Susant Kumar Das, and Umashankar Ghugar An Intelligent Framework Towards Managing Big Data in Internet of Healthcare Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 Sujit Bebortta and Sumanta Kumar Singh Deep Learning Approach for Anamoly Detection in CAN Bus Network: An Intelligent LSTM-Based Intrusion Detection System . . . . . 531 Ch. Ravi Kishore, D. Chandrasekhar Rao, and H. S. Behera Predictive Geospatial Crime Data Analysis and Their Association with Demographic Features Through Machine Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Anupam Mukherjee and Anupam Ghosh Design of an Image Transmission System Employing a Hybridization of Bit-Plane Slicing, Run-Length Encoding and Vector Quantization Based Visual Cryptography Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 Surya Sarathi Das and Kaushik Das Sharma Gravitational Search Optimized Light Gradient Boosting Machine for Identification of Malicious Access in IoT Network . . . . . . . . . . . . . . 570 Geetanjali Bhoi, Bighnaraj Naik, Etuari Oram, and S. Vimal A Hybrid Semi-supervised Learning with Nature-Inspired Optimization for Intrusion Detection System in IoT Environment . . . . . 580 Dukka Karun Kumar Reddy, Janmenjoy Nayak, and H. S. Behera Secure Sharing of Medical Images Using Watermarking Technique . . . 592 Priyanka Priyadarshini, Alina Dash, and Kshiramani Naik An Impact Study on Covid-19 with Sustainable Sports Tourism: Intelligent Solutions, Issues and Future Challenges . . . . . . . . . . . . . . . . 605 Saumendra Das, Janmenjoy Nayak, and Sharmila Subudhi Deep Learning Based Framework for Breast Cancer Mammography Classification Using Resnet50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 Pandit Byomakesha Dash, H. S. Behera, and Manas Ranjan Senapati
xxvi
Contents
A Game Theoretic Group Coordination Strategy for Multi Robot Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634 Buddhadeb Pradhan, Jnyana Ranjan Mohanty, Rabindra Kumar Barik, and Diptendu Sinha Roy Moth Flame Optimization Algorithm Optimized Modified TID Controller for Automatic Generation Control of Multi Area Power System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644 K. Alfoni Jose, G. Raam Dheep, and Tulasichandra Sekhar Gorripotu Identification of Malicious Access in IoT Network by Using Artificial Physics Optimized Light Gradient Boosting Machine . . . . . . . . . . . . . . 653 Etuari Oram, Bighnaraj Naik, Manas Ranjan Senapati, and Geetanjali Bhoi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
About the Editors
Asit Kumar Das is working as Professor in the Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology (IIEST), Shibpur, Howrah, West Bengal, India. He has published almost 150 research papers in various international journals and conferences, 1 book and 5 book chapters. He has edited 6 books and 6 special issues in various journals. He has worked as Member of the Editorial/Reviewer Board of various international journals and conferences. He has shared his research field of interest in many workshops and conferences through his invited speech in various institutes in India. He acts as General Chair, Program Chair, and Advisory Member of committees of many international conferences. His research interest includes data mining and pattern recognition in various fields including bioinformatics, social networks, text, audio, and video mining. He has guided ten Ph.D. scholars and is currently guiding six Ph. D. scholars. Janmenjoy Nayak is working as Assistant Professor, P. G. Dept. of Computer Science, Maharaja Sriram Chandra BhanjaDeo University, Baripada, Odisha, India. He has published more than 170+ research papers in various reputed peer-reviewed referred journals, international conferences, and book chapters. Being two times Gold Medalist in Computer Science in his career, he has been awarded with INSPIRE Research Fellowship from Department of Science and Technology, Govt. of India (both as JRF and SRF level), and Best Researcher Award from Jawaharlal Nehru University of Technology, Kakinada, Andhra Pradesh, for the AY: 2018-19 and many more awards to his credit. He has edited 20+ books and 14+ special issues in various topics including data science, machine learning, and soft computing with reputed international publishers like Springer, Elsevier, Inderscience, etc. His area of interest includes data mining, nature-inspired algorithms, and soft computing. Bighnaraj Naik is Assistant Professor in the Department of Computer Applications, Veer Surendra Sai University of Technology, Burla, Odisha, India. He received his Doctoral degree from the Department of Computer Sc. Engineering
xxvii
xxviii
About the Editors
and Information Technology, Veer Surendra Sai University of Technology, Burla, Odisha, India; Master degree from SOA University, Bhubaneswar, Odisha, India; and Bachelor degree from National Institute of Science and Technology, Berhampur, Odisha, India. He has published more than 150 research papers in various reputed peer-reviewed international conferences, referred journals, and book chapters. He has more than ten years of teaching experience in the field of Computer Science and Information Technology. His area of interest includes data mining, soft computing, etc. Currently, he is guiding four Ph.D. scholars and six master students. S. Vimal is working as Associate Professor in the Dept. of Artificial Intelligence and Data Science, Ramco Institute of Technology, Tamil Nadu, India. He received Ph.D. degree in cognitive radio networking and security techniques using AI from Anna University Chennai, Tamil Nadu. He is working as Associate Professor in the Department of Computer Science and Engineering, Ramco Institute of Technology, Tamil Nadu, India. His areas of interest include game modeling, artificial intelligence, cognitive radio networks, and network security. He has published around 70 papers. He has hosted 21 special issues in IEEE, Elsevier, Springer, and CMC tech science journals. Danilo Pelusi has received the Ph.D. degree in Computational Astrophysics from the University of Teramo, Italy. Presently, he is holding the position of Associate Professor at the Faculty of Communication Sciences, University of Teramo. He served as Associate Editor of IEEE Transactions on Emerging Topics in Computational Intelligence, IEEE Access, International Journal of Machine Learning and Cybernetics (Springer) and Array (Elsevier). He also served as Guest Editor for Elsevier, Springer, and Inderscience journals, Program Member of many conferences, and Editorial Board Member of many journals. He served as Reviewer in reputed journals such as IEEE Transactions on Fuzzy Systems and IEEE Transactions on Neural Networks and Machine Learning. His research interests include intelligent computing, communication system, fuzzy logic, neural networks, information theory, and evolutionary algorithms.
COVID-19 Detection Using Deep Learning: A Comparative Study of Segmentation Algorithms Pranchal Sihare, Azeem Ullah Khan, Poritosh Bardhan, and B. K. Tripathy(B) School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu 632014, India [email protected]
Abstract. Lung abnormality is a prevalent condition that affects people of all ages, and it can be caused by a variety of factors. The lung illness caused by SARS-CoV-2 has recently spread across the globe, and the World Health Organization (WHO) has labelled it a pandemic disease owing to its quickness. Covid-19 mainly attacks the lungs of those infected, resulting in mortality from ARDS and pneumonia in extreme instances. Internal body organ disorders are thought to be more acute, making diagnosis more complex and time-consuming. The source of any illness, location and severity are determined by a pulmonologist basing upon a good number of tests taken in the laboratories or even outside these after the hospitalization of a patient. In between a lot of time is taken to carry out these tests and prediction of COVID 19 is done. The purpose of this work is to propose a model based on CNN and finding out the best fit segmentation algorithm to apply to the chest X-ray scans in order to predict the test result. Most importantly the result is instantaneous. Keywords: Covid-19 · CNN · X-Ray scans · Segmentation
1 Introduction Lung illnesses have become one of the most common ailments affecting humans. Airway illnesses, circulation disorders, and tissue diseases are the three types of lung diseases. Disorders in airway disrupt the passage of many gases in air including oxygen. Circulatory disorders hinder normal flow of blood in the lungs. Tissue inflammation is responsible for the illness in Lung tissue. This affects the expansion capability of lungs. The prevalence of COVID-19 has accelerated the communication of this disease. The respiratory tracts are badly affected due to COVID-19, as a result a lesion layer is formed over the lungs and so this curtails the normal functionality of the lungs. Droplets coming out during the course of coughing or sneezing are responsible for the spreading of COVID-19 from a patient to other normal people. However, similar droplets coming out while breathing are not considered as airborne [5]. Transfer of fomite (a surface which is affected) may be a source of the spreading. But, if the mucous membrane of a body; like eyes, mouth or nose is touched after touching a fomite, has the © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 1–10, 2022. https://doi.org/10.1007/978-981-19-3089-8_1
2
P. Sihare et al.
chance of the transfer of the pathogen. Keeping this in mind frequent and proper washing of hands is advised. The state of illness is supportive to the contamination even though symptoms might be there beforehand. Although, seventy two hours is the duration of surface survival of COVID-19 [21] there may be a gap of two days in the minimum and 14 days at the maximum between the contamination and showing of symptoms. Reverse transcription polymerase chain reaction (RT-PCR) is the widely used testing process performed on nasopharyngeal swab. Two types of COVID-19 pandemic testing kits are available: Antigen tests detect infected people, whereas antibody testing detect antibodies in the blood of someone who has already been infected with the virus. Because they employ a polymerase chain reaction, the majority of antigen tests being developed to diagnose COVID-19 illness are referred to as PCR tests (PCR). The following are the primary obstacles in COVID-19 fast detection: a) Traditional PCR test kits take longer to diagnose illness; b) probes, primers, and physical equipment require longer to produce (swab, containers, etc.). Governments throughout the world are unable to control the spread of COVID-19 in the population due to inadequate COVID-19 detection kit per million people. Neural network models in general [20] and Deep learning techniques in particular have been used in many applications nowadays [19]. In fact, Artificial Intelligence Research has been enhanced by the Impact of Deep Neural Learning [14] [17]. Deep neural networks (DNNs) techniques have been applied effectively in the field of Image processing [15], audio signal classification [13] and healthcare [18]. Convolutional neural networks (CNN) are efficiently used in computer vision and image processing [16]. In this paper we use CNN model as the DNN model for classification. CNNs receive images and a classifier is trained using them. As the name suggests a mathematical operation called the convolution is used. Because of parallel processing of components of the input, the processing is faster. Several studies have been performed so far on this topic. An accuracy of 96% was obtained by using a proposed CNN model in [3]. It was observed in the same paper that accuracy could be improved by using the process of augmentation. Another such study in [4] outperformed some of the existing models in the context of predictive values (both positive and negative), precision, responsivity and accuracy. In our work, we first compare the efficiency of as far as accuracy is concerned. Ultimate aim is to assist the pulmonologists in the process of identification and deciding on the therapeutic procedure. Since the pandemic is widespread on masses, our tool will be accelerating the process of detection of the disease and make the job of the medical analysts easier. Our tool will speed up the identification process and reduce the workload upon the medical experts.
2 Literature Review Using a Kaggle dataset containing patient x-ray photos, an AI based method for the detection of patients suffering from COVID19 was proposed by Alazab et al. [3]. A similar study based on VGG16 is also done by taking chest X-ray images. It was observed that the COVID 19 detection is done better when data augmentation is performed in all the metrics taken into account as components of F-measure and also F-measure as a whole. When 1000 photos of augmented data were considered 99% accuracy was
COVID-19 Detection Using Deep Learning
3
obtained which is an improvement over the 95% of accuracy obtained when 128 images of normal data was considered. A deep learning algorithm for COVID-19 detection using CT scan images is proposed by Zhao et al. [4]. The COVIDx CT-2A and COVIDx CT-2B datasets are used to test a transfer learning-based strategy. A modified version of ResNet, which is a form of CNN model, is employed in the ResNet-v2 model. The models were taught to distinguish between three types of CT scan images: new coronavirus pneumonia, common pneumonia, and a healthy individual. In terms of accuracy, responsivity, precision, positive predictive value, and negative predictive value, the findings were found to be superior to certain current benchmark models. Li et al. [6] proposed and discussed on the efficiency of an algorithm based on DNN and applied on CT scanned images of COVID 19. Existence of the virus was detected, basing upon computed FN values. Here, the data about 10 patients was used to build a database for testing. The RT-PCR test revealed that out of ten cases only two were found to possess the virus. This estimation comes to 20% of FN rate. In [7], an evaluation system for the pandemic was created. Such a tool is found to be useful for persons associated with healthcare in the identification of the victims and determining the extent to which the treatment supporting respiration are needed. The study used a method to divide patients into severity groups based on the WHO’s classifications, namely severe, moderate/mild, and severe/mild. Ventilators are required for patients in advanced stage and they require oxygen support. Other patients having lower complicacy do not require oxygen for their treatment. Approximately 13,500 patients suffering from the pandemic were considered and the dataset was formed through the process of augmentation. The correct identification was done for 93.6% of the elements and the % of patients’ under-evaluation and over evaluation being 0.8 and 5.7 respectively. In [8], an approach using patch-based CNN was presented. For the diagnosis of COVID-19, the approach uses a modest number of training factors. A statistical analysis inspired study taking chest X-rays was studied. It could be concluded from this study that use of data normalization at the preprocessing stage leads to increase in the accuracy in segmentation. A new saliency map based upon Grad- CAM and probability was suggested to handle the technique for local patch basing. It was drawn as a conclusion that the role of pre-processing is very important in confirming the performance of crossdatabase segmentation. In [9], a specific DNN model, ResNet was used for identification of COVID 19 from X-ray images. It has two components; one for identification of COVID 19 and the other for the detection of anomaly. The COVID 19 score improved by using the results of the function detecting anomaly. Hundred X-ray images scanned from 70 COVID 19 patients are used and this is in combination of 1043 images obtained from patients not suffering from the disease. The analysis shows specificity and sensitivity percentages of 70.7 and 96. The AUC was 0.952. A technique based on deep-learning termed deep transfer learning was proposed by Apostolopoulos et al. [10] to automatically detect individuals with Coronavirus infection. The study uses X-ray images from patients suffering from the virus (50 numbers obtained from a shared GitHub repository) and also some who are not suffering from it (50
4
P. Sihare et al.
numbers from Kaggle repository). It was observed that ResNet50 is by far the best among the three models considered. A 3D DNN method was proposed by Chen et al. [11], which uses the existence of segmented lesions for the prediction of the presence or absence of the virus and is based upon a segmentation model based on UNet ++. CT scanned Images from 51 patients and 55 patients with different types of diseases obtained from Wuhan University lab were used for the study. The outcome was to distinguish the patients suffering from pneumonia and non-pneumonia diseases and it was accurate up to 95%. Barstugan et al. [12] employed machine learning methods to detect Coronavirus sooner in CT Scan images in another study. The data came from a Turkish government medical body and includes information from 53 Coronavirus cases and 150 CT scans. Cropping the patch areas of the photographs resulted in four unique subsets of the patch. The extracted features were classified using the SVM classifier, with the best classification results coming from GLSZM feature extraction strategies, which gave 99.68% accuracy. The research papers have used different techniques for Covid-19 detection and CNN models prove to be the best model in this regard. But the accuracies of these CNN models need to be further improved so as to be deemed reliable enough to be put into use in real world application. One way to do this is by pre-processing the X-ray scan images of the patients. Here different segmentation algorithms may prove helpful to pre-process images which may in turn improve the accuracy of CNN model.
3 Proposed Methodology Our work consists of five modules which are described as follows. 3.1 Data Collection and Dataset Formation We have focused upon images obtained from X-ray of chests. The collection was from Kaggle (2000 images) [1] and GitHub (196 images) [2]. This dataset has more than 2000 images, but we have extracted 196 images randomly to maintain class balance. This is essential because our dataset needs to reflect the real world. Also, if the dataset does not have class balance, then the minority class may suffer which will in fact lead to anomalies in the predictions. 3.2 Data Augmentation Data segmentation is required to be applied to the images before feeding these to the NN. Data segmentation is effective specifically for improving the performance of DNN models. The datasets to be used for use in DNNs must be diverse and large to achieve greater accuracy. It was required that enlargement of the images, flipping them and rescaling are essential. The rescaling factor is 1/255 (since 255 pixels will be difficult to process for the model), shear intensity is 0.2 and the range of enlargement is [0.8, 1.2]. Flipping is done in horizontal axis (since vertical flipping will invert the lung scans causing anomalies in the results).
COVID-19 Detection Using Deep Learning
5
3.3 Data Segmentation Segmentation helps to transform the image into something more meaningful and clearer. In our model we will be applying different segmentation algorithms to our dataset images. Then the images will be stored separately in order to integrate them with our CNN model one-by-one and comparing the results with those of the original model. The different segmentation algorithms used are Thresholding, Otsu and K-means clustering. 3.3.1 Threshold Segmentation Algorithm It is an OpenCV method that involves assigning pixel values in relation to a threshold value. Each pixel value is compared to the threshold value in thresholding. It is set to 0 if the pixel value is less than the threshold, else it is set to a maximum value (generally 255). Thresholding is a widely used segmentation technique for distinguishing between foreground and background objects. A threshold is a value that has two zones on either side, one below the threshold and the other above it. The effect of thresholding can be seen in Fig. 1(a) and Fig. 1(B).
Fig. 1. (a) Original Image. (b) Threshold segmented image
3.3.2 Otsu Segmentation Algorithm The threshold value in Otsu Thresholding is calculated automatically rather than being chosen. It is taken into account a bimodal picture (two unique image values). There are two peaks in the histogram that was created. As a result, a general condition would be to pick a threshold value that is halfway between both histogram peak values. The effect of otsu can be seen in Fig. 2(a) and Fig. 2(b). 3.3.3 K-means Clustering The clustering algorithm K-means is used. Clustering algorithms are unsupervised algorithms, meaning they don’t need labelled data. It is used to distinguish between distinct classes or clusters of data depending on how similar the data is. Similarities of elements form the same group are higher than the similarities from different groups. The original image in Fig. 3(a) looks as in Fig. 3(b) after segmentation.
6
P. Sihare et al.
Fig. 2. (a) Original Image. (b) Otsu segmented image
Fig. 3. (a) Original image. (b) K-means segmented image
3.4 Model Creation In this paper a sequential CNN model is used. The summary for such a model is provided in Fig. 4. One tensor as input and one tensor as output are there per each layer, which are in a stack and sequential approach is used. Our model consists of several types of layers. The construction of a convolution kernel is used in the conv2d layer to construct a tensor of outputs. For the purpose of sharpening, blurring, edge detection, embossing and related operations images and kernels are convoluted. By lowering the number of pixels in the output from the preceding convolutional layer, the max pooling 2d layer reduces the dimensionality of pictures. Dropout is a training strategy in which randomly chosen neurons are rejected. They are “dropped-out” at random. This implies that on the forward pass, their contribution to the activation of downstream neurons is eliminated temporally, and on the backward pass, any weight changes are not applied to the neuron. The dense layer is a typical highly coupled neural network layer. It is the most often used and popular layer. The dense layer performs the following operation on the input and returns the result. The flatten layer is used to transform multi-dimensional input to a single-dimensional output. The last dense layer outputs a single value, which is our class label, as shown in the screenshot below. This determines whether the patient is infected by COVID or not. If result is 0 then patient is infected and 1 if the patient is not.
COVID-19 Detection Using Deep Learning
7
Fig. 4. CNN model summary
3.5 Model Training and Testing Our dataset is partitioned into training and testing data by 4:1 ratio, i.e. –The testing and training data sizes are 80 to 312 in the form of X-ray images. The model was trained and tested with batch size of 32 and the number of epochs was 10 and the number of steps in each epoch was 8. As the number of epochs increase the weights are modified so that they give more accurate results. This flow is observed for non-segmented as well as segmented images and the predictions are evaluated on the basis of accuracy metric. The accuracy scores are calculated using Eq. (1) Accuracy =
TP + TN TP + FP + TN + FN
(1)
4 Workflow Diagram We first start with collecting the X-ray images from Kaggle and GitHub repository. We next partition the images into training and testing data. There are 392 images in total out of which 80 are made testing and remaining are kept as training images. We then apply data augmentation. The images are rescaled to a particular dimension, resized and flipped horizontally. Next, we apply segmentation algorithms to the images and store them separately. Then we create our CNN models by adding multiple layers to it, such as convolution 2d layer, dense layer, flatten layer, max pool 2d layer. We then train our CNN model on all the sets of training data. Then we apply the model to the
8
P. Sihare et al.
testing data and evaluate all the performance parameters for the different models. We can now compare the evaluation parameters for all the models and see which segmentation algorithm performs the best. The workflow diagram of the complete process is shown in Fig. 5.
Fig. 5. Workflow diagram
5 Experimental Results The accuracies of the different models are computed and presented in Table 1. The accuracies are calculated by dividing the sum of true positive and true negative values with the total number of values as provided in (1). The graph in Fig. 6 shows the training and testing accuracy for all types of data graphically. Table 2 shows the comparison of accuracy of our model with the already existing models. Table 1. Comparison of accuracy and loss Data
Accuracies
Loss
Training Accuracy Testing Accuracy Training Loss Testing Loss Original Data
94.55%
98.75%
0.164
0.238
Threshold Segmented 96.47% Data
98.9%
0.108
0.243
Otsu Segmented Data 95.19%
97.5%
0.244
0.317
K-means Segmented Data
98.8%
0.248
0.129
95.51%
As we can see, Thresholding gives the best accuracy among the segmentation models.
COVID-19 Detection Using Deep Learning
9
Fig. 6. Comparative accuracy graph
Table 2. Comparison with existing models Models
Accuracy
Proposed model
98.9%
[7] Lee et al.
93.6%
[10] Apostolopoulos et al.
98%
[11] Chen et al.
95%
6 Conclusions and Future Work It is found that use of when segmentation algorithms on the X-ray scan images the accuracy of a CNN model increases We established this by taking Otsu for thresholding and K-means for clustering. We achieved an increase in accuracy of 98.9% on the test data and 96.47% on the training data over 98.75% and 94.55% respectively for nonsegmented data. As we are able to conclude that certain segmentation algorithms increase the performance of CNN models for COVID-19 prediction, the implications of this can be very vast. Other segmentation algorithms can be applied and their performance can be monitored. Furthermore, different CNN models and a larger dataset can help in achieving better performance in predicting COVID-19.
References 1. Joseph, P.C.: Covid Chest X-Ray Dataset. https://github.com/ieee8023/covid-chestxray-dat aset (Oct 1 2020). Last accessed 1 Jan 2022 2. Paul, M.: Chest X-Ray Images (Pneumonia). https://www.kaggle.com/paultimothymooney/ chest-xray-pneumonia. Last accessed 27 Dec 2021 3. Alazab, M., Awajan, A., Mesleh, A., Abraham, A., Jatana, V., Alhyari, S.: COVID-19 prediction and detection using deep learning. Int. J. Comp. Info. Sys. Indus. Manage. Appl. 12(1), 168–181 (2020) 4. Zhao, W., Jiang, W., Qiu, X.: Deep learning for COVID-19 detection based on CT images. Sci. Rep. 11(1), 1–12 (2021)
10
P. Sihare et al.
5. Kraft, M.: Approach to the patient with respiratory disease. In: Goldman’s Cecil Medicine: Twenty Fourth Edition, vol. 1, pp. 512–516. Elsevier Inc. (2011). https://doi.org/10.1016/ B978-1-4377-1604-7.00083-X 6. Li, D., et al.: False-negative results of real-time reverse-transcriptase polymerase chain reaction for severe acute respiratory syndrome coronavirus 2: role of deep-learning-based CT diagnosis and insights from two cases. Korean J. Radiol. 21(4), 505–508 (2020) 7. Wallis, L.A.: COVID-19 severity scoring tool for low resourced settings. African journal of emergency medicine: Revue africaine de la medecine d’urgence (2 Apr 2020). https://doi. org/10.1016/j.afjem.2020.03.002 8. Oh, Y., Park, S., Ye, J.C.: Deep learning covid-19 features on cxr using limited training data sets. IEEE Trans. Med. Imaging 39(8), 2688–2700 (2020) 9. Zhang, J., Yutong, X., Yi, L., Chunhua, S., Yong, X.: Covid-19 screening on chest x-ray images using deep learning based anomaly detection. arXiv preprint arXiv:2003.1233827 (2020). https://doi.org/10.48550/arXiv.2003.12338 10. Apostolopoulos, I.D., Mpesiana, T.A.: Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Physi. Eng. Sci. Medi. 43(2), 635–640 (2020) 11. Chen, J., et al.: Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography. Scientific reports 10(1), 1–11 (2020) 12. Barstugan, M., Umut, O., Saban, O.: Coronavirus (covid-19) classification using ct images by machine learning methods. arXiv preprint arXiv:2003.09424 (2020). https://doi.org/10. 48550/arXiv.2003.09424 13. Bose, A., Tripathy, B.K.: 6 Deep learning for audio signal classification. In: Deep Learning, pp. 105–136. De Gruyter (2020) 14. Adate, A., Arya, D., Shaha, A., Tripathy, B.K.: 4 Impact of deep neural learning on artificial intelligence research. In: Deep Learning, pp. 69–84. De Gruyter (2020) 15. Adate, A., Tripathy, B.K.: 3. Deep learning techniques for image processing. In: Machine Learning for Big Data Analysis, pp. 69–90. De Gruyter (2018) 16. Maheshwari, K., Shaha, A., Arya, D., Rajasekaran, R., Tripathy, B.K.: Convolutional neural networks: a bottom-up approach. In: Deep Learning, pp. 21–50. De Gruyter (2020) 17. Adate, A., Tripathy, B. K.: A survey on deep learning methodologies of recent applications. In: Deep Learning in Data Analytics, pp. 145-170. Springer, Cham (2022) 18. Kaul, D., Raju, H., Tripathy, B.K.: Deep learning in healthcare. In: Deep Learning in Data Analytics, pp. 97-115. Springer, Cham (2022) 19. Bhattacharyya, S., Snasel, V., Ella Hassanien, A., Saha, S., Tripathy, B.K.: Deep Learning: Research and Applications. De Gruyter, Berlin, Boston (2020). https://doi.org/10.1515/978 3110670905 20. Tripathy, B.K. Anuradha, J.: Soft Computing- Advances and Applications. Cengage Learning Publishers, New Delhi (2015) 21. Zac, K.: How to clean and disinfect your phone. https://www.canstarblue.com.au/phone/howto-clean-and-disinfect-your-phone//. (30 Apr 2020). Last accessed 30 Dec 2021
Time Series Analysis on Covid 19 Summarized Twitter Data Using Modified TextRank Ajit Kumar Das1(B) , Kushagra Chitkara2 , and Apurba Sarkar1 1
2
Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, India [email protected], [email protected] Department of Electrical Engineering, Indian Institute of Technology, Kharagpur, Kharagpur 721302, West Bengal, India
Abstract. To interpret people’s sentiments using Covid 19 twitter data, we discovered that people’s sentiments are distributed across multiple dimensions. Six significant themes were chosen, including administration, disease, healthcare, location, precaution and citizens, because these topics receive a lot of attention on social media. In this paper, we used the modified text rank extractive summarization approach on Covid 19 twitter data to reduce data volume without sacrificing the quality. The keywords are chosen from the pre-processed data set where the frequency of the words exceeds a pre-determined value decided through several trial runs. The goal is to extract the most number of word sets possible from the original tweets. These keywords have been grouped into the six categories listed above. To understand the trend of the topics, all the keywords belonging to a specific topic is searched in the summarised file of a given day, the count of that topic is increased on every successful match of the search. The graphs for the counts of all the themes for each day were then plotted. To identify patterns, seven-day moving average graphs are drawn for each topic.
Keywords: Covid 19
1
· Twitter data · Time series trend analysis
Introduction
Starting from December 2019, there are millions of tweets per day related to covid 19 twitter data. In india the Covid 19 spread has started starting from March 2020. The sentiments analysis talks about positive, negative and neutral sentiment on any subject. However, sentiment on covid 19 data can be analyzed with respect to different dimensions. We have selected six important dimensions namely administration, disease, healthcare, location, precaution and citizens for sentiment analysis as there are great deal of discussion on these topics in the social media. As the volume of the tweets per day is in millions we have utilized c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 11–23, 2022. https://doi.org/10.1007/978-981-19-3089-8_2
12
A. K. Das et al.
graph based modified text rank summarization method to reduce the tweet volume without losing the opinion of the writers. We have collected tweets from April 2020 to September 2020. Thus we get 183 summary files one each per day. These daily summary files are used as input to our method for topic based time series trend analysis. However, to decide the list of keywords for each of the topic we have taken the original tweet files as input. These original datasets contain much irrelevant and redundant information, which are removed, using various text pre-processing techniques. The Twitter dataset contains noisy information, such as URLs, dots, emoticons, symbols & pictographs, transport & map symbols, flags, and so on. The tweets are also cleaned from the hashtags (‘#’), retweets (“RT”), ‘@’ symbol used for tagging, etc. In addition to the above symbols, the emoticons and the emojis are removed to get cleaned tweets. The stopwords are removed such that only the words providing the meaningful information are kept in a sentence. After cleaning, the processed file is stored in a separate directory. These processed files are used as input to identify keywords. The target is to select optimum number of word sets from the original tweets. The words for which the frequency is more than a pre-defined value, which is set experimentally by multiple trial runs, are selected as keywords. We have also selected some hindi words written using english alphabets to include the opinion of the hindi speaking people from India. We have then classified these keywords into above six topics. To understand the trend of these topics against time we have used count of keywords as the metric for each topic. On the summarised file of a particular day, if a keyword is found which belongs to a specific bucket or topic, the count of that topic increases by one(1). We have then plotted line graphs against the date to understand the trend of these topics. However, to account for the fact that some day, there may not have enough volume of tweets, we divided the counts obtained in the previous step by the number of tweets for that particular day to normalise it. Better trends were observed in that. For better readability, we even took seven(7) day moving averages for the trends.
2
Literature Survey
Text summarization approaches are categorized into two types namely extractive summarization [20] and abstractive summarization [10]. Text summarization is also classified as Indicative [9] and informative [16]. Informative summarization technique is used for making Generic [7] and query-oriented [17] summary. Singledocument summary [11] takes sentences from the document itself whereas multidocument summary [6] makes a summary by fusing sentences from different documents. Topic-oriented summarization [8] is made based on users topic of interest and the information extracted from the given document related to some specific topic. In this paper we have used the graph based modified text rank summarization and hence will not concentrate on the summarization literature survey. Rather we have concentrated on the twitter literature survey. The rising popularity of social media platforms such as Facebook, Twitter, Linkedin, and Instagram, each with its unique set of features and applications,
Time Series Analysis
13
is having a significant impact on our societies. For example, Facebook is a social network in which everyone in the network has a mutual relationship with someone else in the network. On the other hand, not everyone in the Twitter network has a reciprocating relationship with others. Twitter is a social media platform that allows users to post and receive 140-character messages known as “tweets”. It was launched in 2006. There are different types of twitter data such as user profile data which is static and tweet messages which is dynamic. Tweets could be textual, images, videos, URL, or spam tweets. Most studies do not, usually, take spam tweets and here we have concentrated on text tweets. Twitter is a large forum for presenting and exchanging various ideas, thoughts, and opinions. People comment, compliment, discuss, fight, and insist regardless of where they come from, what religious beliefs they hold, whether they are rich or poor, educated or ignorant. In contrast to Facebook, where users may restrict the privacy of their profiles, Twitter allows unregistered users to read and monitor the majority of tweets. The huge amount of data offered by Twitter, such as tweet messages, user profile information, and the number of followers in the network, is extremely useful for data analysis [3]. The public timeline, which displays all of the users’ tweets from across the world, is a massive real-time information stream with over one million messages per hour. As a result, tweets can be used to search through social media data and locate messages that are related to one another. Opinion mining [1], also known as sentiment analysis [15], is the process of determining the sentiment that the writer wishes to convey in his or her message. The text polarity, i.e. whether the message has a positive, negative, or neutral sentiment, is usually represented by the Sentiment. So, for any topic, Twitter may be used to grasp the sentiment of the population at large. For analysing feelings from text, a machine learning approach can be utilised. Using a Machine Learning method, some sentiment analysis is performed on Twitter posts about electronic devices such as cell phones and computers. It is possible to determine the effect of domain information on sentiment classification by undertaking sentiment analysis in a specific domain. Sentiment analysis offers numerous possibilities for developing a new application. In the industrial sphere, sentiment analysis has a significant impact; for example, government agencies and large corporations want to know what people think about their product and its market value. The goal of sentiment analysis is to extract a person’s mood, behaviour, and opinion from text. Sentiment analysis is widely employed in a number of fields, including finance, economics, defence, and politics. Unstructured and structured data can be found on social networking sites. Unstructured data makes up over 80% [14] of the data on the internet. To find out what people think on social media, sentiment analysis techniques are applied. Sentiment analysis consist of four stages named as: tweet retrieval, tweet pre-processing, classification algorithm and evaluation. The general steps for twitter sentiment analysis is shown in Fig. 1.
14
A. K. Das et al.
Fig. 1. Twitter sentiment analysis steps
2.1
Input
The subject matter or issue for which we want to do sentiment analysis is the input. We start by choosing a subject, then gather tweets pertaining to that issue, and finally perform sentiment analysis on those tweets. In this paper our subject is covid 19. 2.2
Tweets Retrieval
Tweets are retrieved in this step, and they can be in any format unstructured, structured, and semi-structured. We can collect tweets using different programming languages such as Python or R (the programming language and the software environment for data analysis) or Java API. 2.3
Pre-processing
In this step data is filtered by removing irrelevant, inconsistent, and noisy data. The majority of studies focused on software, such as R, When it comes to processing Twitter data, R has several limitations and is inefficient when working with big amounts of data. A hybrid big data platform, such as Apache Hadoop (an open source Java framework for processing and querying enormous volumes of data on large clusters of commodity hardware), is typically used to solve this challenge. Hadoop can also handle structured and semi-structured data, such as XML/JSON files. Hadoop’s strength is in storing and processing vast amounts of data, whereas R’s strength is in analysing data that has already been processed. 2.4
Sentiment Detection
There are different sentiment classification algorithm to find the polarity of a given subject. In supervised learning Naive Bays, SVM and maximum entropy are widely used algorithms. Whereas in unsupervised learning lexicon based,
Time Series Analysis
15
corpus based and dictionary based are used to perform the sentiment analysis. In this paper we have used topic modelling by selecting keywords for each topic as explained in next section. 2.5
Evaluation
The output is assessed to see whether we should choose it or not, and the results are then shown as a bar graph, pie chart, or line graph. In this paper we have done time series analysis for each topic and used count of keywords for each topic as the metric for plotting line graph against the date to understand the trend of that topic. To perform sentiment analysis on COVID-19 Twitter data, Vijay et al. [18] used TextBlob to find the polarity of scraped tweets and NLTK get word frequency. Each state is analyzed monthwise separately. First, the state-wise analysis is done and then the frequency of Positive, Negative, and Neutral tweets are calculated. In the paper Sentiment Analysis of Covid-19 Tweets using Evolutionary Classification-Based LSTM Model [4], dense word cloud representation, word popularity detection using N-gram and sentiment modeling using sequential LSTM is done. To detect sentiment dynamics and clusters of Twitter users Ahmed et al. [2] have proposed a model to identify users sentiment for top-k trending sub-topics related to COVID-19. It has also detected the top active users based on their involvement score on those trending topics. The authors have used two main approaches namely localized outbreak predictions and indepth behavioural analysis [5] to present the methodology by which twitter data surrounding the first wave of the COVID-19 pandemic in the UK is evaluated. A Named Entity Recognition (NER) algorithm is applied to recognise tweets containing location keywords within the unlabelled corpus. This approach allowed extraction of temporal trends relating to PHE case numbers, popular locations in relation to the use of face-coverings, attitudes towards face-coverings, vaccines etc.
3
Proposed Methodology
In this paper we have applied the modified text rank extractive summarization method on covid 19 twitter data. Figure 2 represents the process flow diagram of the proposed methodology. 3.1
Data Collection and Preprocessing
The COVID-19 coronavirus articles collected during the period of April 2020 to September 2020 are used to demonstrate the application of the modified text rank summarization method. The dataset used is the one which is maintained by Panacea Lab1 . They have maintained data for all covid related tweets for the required time period. We have 1
https://github.com/thepanacealab/covid19 twitter.
16
A. K. Das et al.
Fig. 2. Process flow diagram
followed the steps to download the data files as mentioned in the tutorial of the repository. After downloading, and filtering for English language, what we get are the tweet ids. These ids are then concatenated in a text file. This text file is fed to a hydrator tool, DocNow hydrator2 . After hydrating, what we obtain is a JSONL file with an option to convert that to CSV. After getting the relevant information, the tweets are filtered on the basis of if they’re geotagged or not. This process is repeated for six(6) months data. In trend analysis we have used global data which covers tweets from all countries. The datasets contain many irrelevant and redundant information, which are removed, using various text preprocessing techniques [19]. One of the major challenges faced was hydrating the tweet ids. The dataset maintained by Panacea Lab consisted of only tweet ids, so to get relevant information from it, we have used a hydrator tool. Doing the process for roughly a million tweets per day is a particularly challenging task, as the free version of the Twitter Developer account has a rather strict limit on the number of tweets that could be accessed in a 15 min interval. So just the hydrating process took us over a month. Also the final output of the hydration was in the form of jsonl files. While the tool we used did give us an option to convert to csv, there was a huge loss in information while doing that. So, we have manually converted the JSONL file to CSV file for extracting necessary information. Since the information on co-ordinates of the tweet was important to us, we had to manually extract the same from jsonl files. Also the tool would sometimes crash when overloaded with activity 2
https://github.com/DocNow/hydrator.
Time Series Analysis
17
so in that scenario we had to start again for that particular day. The csv file contains irrelevant and noisy information, such as hashtags, URLs, dots, emoticons, symbols & pictographs, transport & map symbols, flags, and so on. So we came up with implementation of regex to extract the relevant information from the texts. We have removed the URLs present in the tweets by using the output of urllib.parse on the tweet. The tweets are also cleaned from the hashtags (‘#’), retweets (“RT”), ‘@’ symbol used for tagging, etc. In addition to the above symbols, the emoticons and the emojis are removed to get cleaned tweets. It is achieved by performing string manipulations and using the UNICODE ranges for emojis. After cleaning, the processed article is tokenized into sentences using the NLTK library of Python. The sentences are converted to lowercase, and punctuation are removed from the sentences. The stopwords are removed using nltk.corpus such that only the words providing the meaningful information are kept in a sentence. Next we tokenize the given text into sentences and finally, tokenize each sentence into a collection of words which is used as the input for the modified text rank algorithm. 3.2
TextRank Based Summarization
The twitter data contains lot of repeated or redundant data because multiple users tweets same or similar contents. Hence, summarising the twitter data is necessary. In graph based text summarization using modified text rank [12], we have considered the sentences in the document which are equivalent to web pages in the PageRank system [13]. The probability of going from sentence A to sentence B is equal to the similarity between two sentences. This modified TextRank uses the intuition behind the PageRank algorithm to rank sentences based on which we can select the most important sentences from the input text document. In PageRank, important web pages are linked with other important web pages. Similarly, in modified text rank algorithm the important sentences are linked (similar) to other important sentences of the input document. Here, isf-modified-cosine similarity takes care of the different level of importance for the corresponding words in the sentences and also consider different length of the sentence in the document. Finally for summarisation top n scored sentences are rearranged as per the input sentence index. This construct the summary of the document. The experimental result from the evaluation of this summarization method [12] is found to be very high and hence using this technique we have reduced the twitter data volume without sacrificing the quality of the data. 3.3
Topic Wise Keyword Selection and Count
Once the cleaned file was obtained from the pre-processing steps, it was then summarised using modified textrank algorithm. The summary for each day was prepared separately. Thus we got 183 summary files. In the next step, we have selected six topics to understand the trends of these individual topics over time. The topics include, administration, disease, healthcare, location, precaution and citizens. We chose these topics because they have received a lot of media and
18
A. K. Das et al.
public attention. The words that appear more than a predefined value, which is set experimentally by multiple trial runs, in the original tweet set have been selected as a keyword. In our experiment, the predefined value is set to ten(10). We have manually grouped these keywords into the six different topics such that the words that resemble the topic most are considered as part of that topic. The topic wise keywords selected are shown in Table 1. Table 1. Topic and Keywords Topic
Administration
Keywords [‘relief’, ‘lie’, ‘pulis’, ‘srkaar’, ‘sarkar’, ‘sarkaar’, ‘congress’, ‘police’, ‘pm’, ‘chief’, ‘minister’, ‘hm’, ‘members’, ‘member’, ‘distributed’, ‘govt’, ‘government’, ‘suppo’, ‘food’, ‘judgement’, ‘distributing’, ‘diyaa’, ‘modi’, ‘scandalous’, ‘cm’, ‘kovind’, ‘mjduur’, ‘food’, ‘presidents’, ‘nehru’, ‘dynasty’, ‘attacks’, ‘opponents’, ‘leadership’, ‘narendra’, ‘prime’, ‘bjp’, ‘aadesh’, ‘amendment’, ‘ordinance’, ‘commitment’, ‘raashn’, ‘express’, ‘scam’, ‘niti’, ‘president’, ‘modis’, ‘fund’, ‘ruupaay’, ‘rupee’, ‘producer’, ‘constable’, ‘maansiktaa’, ‘crore’]
Disease
Healthcare
Location
Precaution
Citizens
[‘manifests’, ‘covid19’, ‘covid-19’, ‘corona’, ‘coronavirus’, ‘cases’, ‘koronaa’, ‘positive’, ‘patients’, ‘covid’, ‘pandemic’, ‘crisis’, ‘koroonnnaa’, ‘spread’, ‘dies’, ‘virus’, ‘cured’, ‘donate’, ‘diseases’, ‘maut’, ‘suffering’, ‘deaths’, ‘died’, ‘epidemic’, ‘death’, ‘patient’, ‘demise’, ‘case’, ‘dead’, ‘coronavirus’, ‘maaro’, ‘tested’, ‘fighting’, ‘symptoms’]
[‘help’, ‘nivaarnn’, ‘humanitarian’, ‘aid’, ‘treatment’, ‘ilaaj’, ‘ilaaz’, ‘metabolics’, ‘stitched’, ‘labs’, ‘blood’, ‘fight’, ‘rapid’, ‘stepped’, ‘respect’, ‘healthcare’, ‘plasma’, ‘icmr’, ‘hospitals’, ‘donated’, ‘wellbeing’, ‘commend’, ‘lose’, ‘save’, ‘lives’, ‘hospital’, ‘dr’, ‘doctor’, ‘dr.’, ‘antivenom’, ‘treatment’, ‘poisonous’, ‘medicines’, ‘donating’, ‘medical’, ‘testing’, ‘giving’, ‘gratitude’, ‘villagedoctors’]
[‘india’, ‘delhi’, ‘desh’, ‘world’, ‘dillii’, ‘dilli’, ‘mumbai’, ‘indias’, ‘tamil’, ‘country’, ‘china’, ‘states’, ‘state’, ‘tmilnaaddu’, ‘tamil nadu’, ‘manipurdignity’, ‘area’, ‘maharashtra’, ‘chennai’, ‘nadu’, ‘yuupii’, ‘up’, ‘ghaziabad’, ‘gaajiyaabaad’, ‘american’, ‘amerikn’, ‘countries’, ‘nizamuddin’, ‘bihar,’, ‘central’, ‘western’, ‘karnataka’, ‘south’]
[‘proactive’, ‘masks’, ‘test’, ‘kits’, ‘face’, ‘kit’, ‘kitt’, ‘walking’, ‘health’, ‘measures’, ‘safety’, ‘mask’, ‘purchase’, ‘ghr’, ‘ghar’, ‘lockdown’, ‘home’, ‘month’, ‘stay’, ‘lonkddaaun’, ‘lockdown’]
[‘office’, ‘people’, ‘workers’, ‘indian’, ‘log’, ‘lady’, ‘bhuukhe’, ‘bhukhe’, ‘everyone’, ‘youth’, ‘girl’, ‘human’, ‘krodd’, ‘media’, ‘migrant’, ‘thalapathy’, ‘12year’, ‘kmaane’, ‘kamane’, ‘shri’, ‘shrii’, ‘actor’, ‘worker’, ‘khaanaa’, ‘khana’, ‘lakhs’, ‘millions’, ‘appeals’, ‘news’, ‘everyones’, ‘everyone’, ‘brothers’, ‘brother’, ‘person’, ‘protect’, ‘journalists’, ‘journalist’, ‘sisters’, ‘sister’, ‘needy’, ‘body’, ‘private’, ‘sir’, ‘community’, ‘neighbors’, ‘shops’, ‘privaar’, ‘parivar’, ‘parivaar’, ‘relatives’, ‘family’]
To get the trend of these topics, we have searched for those keywords in the summarised file. If a keyword belonging to the Administration bucket was found, then the count of Administration went up by 1 for that particular day. We then plotted the graphs for the counts of all the buckets for each day and observed the trend for the same. However to account for the fact that some days there may not have enough volume of tweets, we divided the counts by the number of tweets for that particular day to normalise it. Better trends were observed in that. For better readability, we even took seven(7) day moving averages for the trends.
Time Series Analysis
4
19
Experimental Result
The time series graphs obtained for these six topics are explained in Fig. 3, 4, 5, 6, 7 and 8.
Fig. 3. Topic: Administration
4.1
Fig. 4. Topic: Disease
Topic Administration
For this graph we could clearly observe a small but constant rise in count Topic: Administration until about 1st week of May, after which it meets a steep decline. This could be explained by the fact that most of Europe had seen the worst of the first wave and the cases were in a decline there. The United States was giving a constant number of cases and India too hadn’t experienced the effect of the first wave and was still in lockdown. From about 1st of June to mid August, we see a fairly constant line with a spike here and there. This could be because of how India was easing it’s lockdown restrictions so a few days of news may have inspired the spike. Further Europe and China were beginning to open their borders and hence administration tweets could be associated with them. After mid August, we suddenly see a spike in the graph. This is because of the fact that the US was giving an all time high cases at that point of time. Further what could have inspired this huge spike is the fact that the US was about to start with their presidential debates and a lot of administration related tweets would have been regarding these debates. The first of these debates was held on 30th of September and the huge spike is a clear indication of the fact that the election inspired a lot of such tweets. 4.2
Topic Disease
The disease graph is fairly constant throughout as was expected because after normalising the data, the relevance of covid in itself has not seen any decline. A few talking points are the spike observed around July. Apart from the cases in India being on a constant rise, this could also be associated with the fact that this was when Oxford University had first announced they were actively working on a vaccine. Also Brazil was right in the peak of its first wave. The disease graph then was more or less a straight line.
20
A. K. Das et al.
Fig. 5. Topic: Healthcare
4.3
Fig. 6. Topic: Location
Topic Healthcare
As we could see in the early part of the graph, the counts are really high. This unusually high number of counts is associated with the fact that early April was the time when Boris Johnson was hospitalised because of Covid. Further this when we were seeing cases of violence against healthcare workers in India that may have inspired these tweets to be in such large numbers. As Covid started getting normalised and restrictions started lifting, the healthcare related tweets started dropping. The next spike we observe is around mid June. The constant rise could be explained by the fact that world leaders like Brazil’s president Bolsonaro and soon after Amit Shah tested positive for Covid. 4.4
Topic Location
We observe the count to be high earlier on and then facing a dip. This could be because of the fact that Europe and the US were past the peak of the first wave and weren’t tweeting regarding the location of Covid. The next spike that we see around early June is when Europe finally starts announcing it’s opening up of borders so a lot of tweets might have been regarding travelling in covid. Also this is around the time that Chinese-Indian relations soured with people going as far as calling covid a Chinese lab made virus. A lot of tweets could’ve been inspired by this. Meanwhile the then president Trump insisted on calling it the ‘Chinese Virus’ inspiring a lot of hate crimes against Asian-Americans which is reflected in the spike starting June. The next spike we see is around September when Trump started actively campaigning and many of his supporters went about a similar Asian hate as seen before. 4.5
Topic Precaution
Popularity of precaution related tweets is seen to be constantly declining. This is because we started getting accustomed to the new normal. However a steep spike is seen about early June. This is when Europe had seen through the first wave so as people were stepping out more, they were constantly warned to keep precautions in check. Even sports like Formula 1 and Football were starting again so precaution measure tweets were gaining popularity. But the most important
Time Series Analysis
Fig. 7. Topic: Precaution
21
Fig. 8. Topic: Citizens
factor here would be the rise of the ‘Black Lives Matter’ protests occurring all around the world after the shooting of George Floyd. As more and more people wanted to participate in the protests and marches, they were reminded to keep precautions in check and hence a lot of precaution related tweets would be because of this. Lastly, after this decline, we see another spike in mid September. This is when India was giving the highest single day cases seen and the Prime Minister was urging everyone to follow precaution measures. Even in the US, presidential debates were starting to begin and the rallying leading up to these events would have inspired precaution tweets. 4.6
Topic Citizens
By just taking a quick look, we can conclude that this graph looks very similar to the precaution and location graphs. And much for the same reasons. The June spike is associated with people travelling and protests occurring in different parts of the globe. A slow and steady decline follows and ultimately a huge spike around September because of the extremely high number of cases in India and the simultaneously occurring elections in the US.
5
Conclusion and Future Work
In this paper we have studied different literature for twitter data sentiment analysis. We have used modified text rank algorithm to summarize covid 19 twitter data of each day starting from April 2020 to September 2020. Using keyword based topic modelling for six predefined topics which are of great importance with respect to covid 19, we have done a trend analysis of the topic. The behaviour of the global twitter user could be explained scientifically from the trends of the graph. This also explains the effectiveness of the summary to reduce data set as well as the bucketing of the keywords into defined topics. These topic-wise summaries are useful for the administration to understand the public sentiment and opinions on different topics from these summaries and respond appropriately. This information may assist the government in understanding the prevalence of viral spread, taking preventative measures or precautions, deciding official health advisories, social distancing, health care information and public service announcements.
22
A. K. Das et al.
However, in the pre-processing stage, while trying hydrating tool the information is lost while converting to CSV. In addition, the hydrating tool used to crash sometimes due to overload of data. So, we have manually converted the JSONL file to CSV file. This creates a challenge in real time implementation of the time series analysis. As a future work we plan to do time series cluster analysis on the global covid 19 twitter data to cluster the countries. This will provide us the similar trends for countries in the global scenario. Compliance in Ethical Norms Conflict of Interest. The authors here proclaim that this manuscript doesn’t have any conflict of interest with other published resources and is not published earlier (partial way or in complete manner). None of the data is fabricated or modified for supporting the conclusive decisions.
References 1. Adarsh, M., Ravikumar, P.: Survey: Twitter data analysis using opinion mining. Int. J. Comput. Appl. 128(5) (2015) 2. Ahmed, M.S., Aurpa, T.T., Anwar, M.M.: Detecting sentiment dynamics and clusters of Twitter users for trending topics in Covid-19 pandemic. PLoS ONE 16(8), e0253300 (2021) 3. Anber, H., Salah, A., Abd El-Aziz, A.: A literature review on Twitter data analysis. Int. J. Comput. Electr. Eng. 8(3), 241 (2016) 4. Chakraborty, A.K., Das, S., Kolya, A.K.: Sentiment analysis of Covid-19 tweets using evolutionary classification-based LSTM model. In: Pan, I., Mukherjee, A., Piuri, V. (eds.) Proceedings of Research and Applications in Artificial Intelligence. AISC, vol. 1355, pp. 75–86. Springer, Singapore (2021). https://doi.org/10.1007/ 978-981-16-1543-6 7 5. Cheng, I., Heyl, J., Lad, N., Facini, G., Grout, Z.: Evaluation of Twitter data for an emerging crisis: an application to the first wave of Covid-19 in the UK. Sci. Rep. 11(1), 1–13 (2021) 6. Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACLANLPWorkshop on Automatic Summarization, vol. 4, pp. 40–48. Association for Computational Linguistics (2000) 7. Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25. ACM (2001) 8. Harabagiu, S., Lacatusu, F.: Topic themes for multi-document summarization. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 202–209. ACM (2005) 9. Kan, M.Y., McKeown, K.R., Klavans, J.L.: Applying natural language generation to indicative summarization. In: Proceedings of the 8th European Workshop on Natural Language Generation, vol. 8, pp. 1–9. Association for Computational Linguistics (2001)
Time Series Analysis
23
10. Khan, A., Salim, N.: A review on abstractive summarization methods. J. Theor. Appl. Inf. Technol. 59(1), 64–72 (2014) 11. Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24. Association for Computational Linguistics (2008) 12. Mallick, C., Das, A.K., Dutta, M., Das, A.K., Sarkar, A.: Graph-based text summarization using modified TextRank. In: Nayak, J., Abraham, A., Krishna, B.M., Chandra Sekhar, G.T., Das, A.K. (eds.) Soft Computing in Data Analytics. AISC, vol. 758, pp. 137–146. Springer, Singapore (2019). https://doi.org/10.1007/978981-13-0514-6 14 13. Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL Interactive Poster and Demonstration Sessions, pp. 170–173 (2004) 14. Mittal, A., Patidar, S.: Sentiment analysis on Twitter data: a survey. In: Proceedings of the 2019 7th International Conference on Computer and Communications Management, pp. 91–95 (2019) 15. Patel, A.P., Patel, A.V., Butani, S.G., Sawant, P.B.: Literature survey on sentiment analysis of Twitter data using machine learning approaches. IJIRST-Int. J. Innov. Res. Sci. Technol. 3(10) (2017) 16. Saggion, H., Lapalme, G.: Generating indicative-informative summaries with SumUM. Comput. Linguist. 28(4), 497–526 (2002) 17. Tang, J., Yao, L., Chen, D.: Multi-topic based query-oriented summarization. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 1148– 1159. SIAM (2009) 18. Vijay, T., Chawla, A., Dhanka, B., Karmakar, P.: Sentiment analysis on Covid-19 Twitter data. In: 2020 5th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–7. IEEE (2020) 19. Vijayarani, S., Ilamathi, M.J., Nithya, M., et al.: Preprocessing techniques for text mining-an overview. Int. J. Comput. Sci. Commun. Netw. 5(1), 7–16 (2015) 20. Wong, K.F., Wu, M., Li, W.: Extractive summarization using supervised and semisupervised learning. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 985–992. Association for Computational Linguistics (2008)
Low-Computation IoT System Framework for Face Recognition Using Deep Learning Algorithm Jayanta Paul(B) , Rajat Subhra Bhowmick, and Jaya Sil Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, India [email protected], {rajatb.rs2017,js}@cs.iiests.ac.in
Abstract. With the growing concept of smart cities, IoT applications such as face-authenticated smart-home door-lock security systems have gained importance and become popular. The major challenge in implementing the algorithms on IoT devices is a lack of computational power. The major challenge in implementing the algorithms on IoT devices is a lack of computational power. The state-of-the-art methods run successfully in high-end computing units, are typically not used in Raspberry pi, considered as an IoT device. This paper aims to design an authorized door lock system using the framework Raspberry pi. Instead of Euclidean distance-based similarity measure between the test and the trained face images, here for face identification the pretrained FACENET model is used that reduces the false cases. In the current approach, Haar-Cascade is used to detect the faces and next we train the fully connected layers on the collected facial embeddings of FACENET. The proposed HaarCascade FACENET Fully Connected Face Authentication (HFFCFA) model eliminates the faults which occur in the Euclidean-based approach, thus reducing false positive and false negative cases. The network model works perfectly together with the Raspberry Pi pipeline for face recognition. Keywords: IoT · Door-lock system recognition · Deep learning
1
· HFFCFA · Raspberry pi · Face
Introduction
In recent times researchers have been involved in devising automated technology to develop smart cities which would be fully digitized. Creating and maintaining smart cities necessitates the use of upgraded smart home automation. Most computerized systems are implemented by utilizing microcontroller devices, while various remote home automation solutions are proposed on the Internet. However, internet processing has hardware restrictions depending on the network. On the contrary, a microcontroller cannot execute many programs at a time, so c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 24–35, 2022. https://doi.org/10.1007/978-981-19-3089-8_3
Face Recognition Using Deep Learning Algorithm
25
it cannot control home appliances and home monitoring simultaneously. Energy and costs are the main bottlenecks for building a fully automated home. A lowcost single-board device like Raspberry Pi may act as an efficient service platform to address these problems [25]. The capacity of Raspberry Pi has been utilized as an independent computer device as well as an IoT device with the help of networking infrastructure. It has GPIO and USB connectors and a camera interface that can be used for different applications. Home safety and supervision are the two most essential components [13] to design a smart home. Human identification and authentication or recognition, including entrance security controls on appliances, are important issues towards home security. There are several types of biometric person-specific data, such as face fingerprint [10], iris [22]. In comparison to other kinds of biometric data, face detection and identification technology provide accurate [14] outcomes and need minimum human participation. Most of the recognition models are developed using the features extracted from the face images. Face recognition tasks consist of background detection and face identification. Template matching methods (Haar Cascade) [24] and neural networks (NN) [20] are used to detect the faces. There are several other approaches such as Eigenfacial [23], Gabor Wavelet [3], Hidden markov model (HMM) [19], NN [7] and Support vector machine (SVM) [8] in performing face recognition [11]. NN models exhibit higher accuracy in both face detection and face identification [9] applications. The limitation of Raspberry Pi is its low computational power, the main hurdle in processing image sequences in real-time by employing NN models. Advanced methods used for detecting and recognizing faces involve network pipelines, which require extra graphics processing uni GPUs [6] to be operated on Raspberry Pi. However, external GPUs limit the utilization of Raspberry Pi as an IoT device. Face detection neural network pipelines are more computationally expensive. Therefore, it is suggested to use Template Matching Method or Haar cascade in Raspberry Pi for feature extraction. The face recognition neural network model is computationally less expensive and, once trained using an external GPU, can be executed in Raspberry pi. There are various models available for face recognition like Deep face [21], SphereFace, Normface and FACENET [17] model. All of these neural network models use many layers of Convolution Neural Network (CNN) [5] to classify the faces. The FACENET with less number of Convolution layers is executable in the low-computation environment using the dlib library. The FACENET model generates a unique 128-d vector corresponding to the face image of each person. This 128-d vector is called face embedding. A person is recognised based on the Euclidean distance which is least while assessed with the stored facial embeddings and a real-time testing is conducted using the FACENET model. However, there are two significant inconveniences to the algorithm pipeline utilized in earlier experiments [16]. The Euclidean distance are calculated with the stored faces, and the model becomes computationally costly when the stored faces are huge in number. In addition, many embeddings may give incorrect result in the event of changing angle and lighting, therefore not invariant in nature. The paper tries to address the problem by extending the algorithm pipeline containing another neural
26
J. Paul et al.
network model for face image classification, instead of Euclidean distance-based measure. The proposed approach reduces the false cases to recognizing the faces in real-time in comparison to other techniques. We use the Softmax function for classification and if probabilities of all class labels are less than 0.9, the face is deemed unknown, otherwise recognized. Therefore, the faces are recognized with minimum uncertainty, resulting in minimum false cases. Contributions of the paper are summarized below. – The model is used with the Raspberry Pi pipeline (Haar-cascade and FACENET) without any additional computational resources. – We trained the model with an image data set created by us using a Raspberry pi camera. – Our experiments exhibit that the methodology applied to detect and recognize a person reduces false-positive cases. – The Raspberry Pi model based on a neural network is a cost-effective solution for enhancing the embedded security monitoring system. – The study is supported with thorough experimentation by collecting faces with angle and light variations. The paper is organized into six sections. In the second section, related works are presented. We explain the suggested system and hardware architecture in section three. In the fourth section, the proposed methodology and technique for performing the facial recognition problem are described, along with an overview of the implemented system. In the fifth section, we provide the performance of our proposed model. Finally, in section six, we conclude the paper.
2
Related Work
The use of Raspberry Pi as hardware for IoT applications is proposed by Jayanta et al. [11]. They have shown that Raspberry Pi can function as an IoT device and has enough computation resources to act as a server. Bhoi et al. [2] proposed a novel approach using IoT sensor and ML technique is to build a cost-effective standardized environment monitoring system (IoT-EMS) in volunteer computing environment. Shah et al. [18] implemented a low-cost IoT-based bio-metrics architecture using Raspberry Pi through a remote wireless enrolment node. However, in their work, the encrypted biometric traits were sent from the Raspberry Pi client to the Azure cloud for decryption [12]. They reported a high execution and authentication time due to encryption and decryption procedures over the network. Bhoi et al. [1] proposed a ML-trained recommendation system, which is IoT enabled and efficiently uses water without much intervention of the farmers. The gathered data are forwarded and stored in a cloud-based server, which applies ML approaches to analyze data and suggest irrigation to the farmer. The complete processing of face recognition in Raspberry Pi was implemented by KRaju et al. [15]. They used conventional face detection and recognition techniques such as Haar cascade classifier (Template Matching Method), trained for detection by employing Local Binary Pattern (LBP) as a feature extraction technique. They reported an overall accuracy 94% and an average processing time
Face Recognition Using Deep Learning Algorithm
27
120 ms. The processing of deep learning models in Raspberry Pi was proposed by Curtin et al. [4]. They examined the viability of a camera system based on Raspberry Pi to identify wildlife of interest. They focused on local images using CNN for recognition, which enables the system to send only the images of animals that are requested. Curtin et al. [4] developed a neural system running on a Raspberry Pi 3B+ frame using TensorFlow and Keras. They trained the model on 3600 pictures and reached a maximum accuracy of 97%.
3
The Door-Lock System Architecture
The suggested door lock framework is shown in Fig. 1. The system captures images of the frontal faces with varying angles and lighting conditions to create an image database [12]. Input modules, computing modules, enrolment modules, verification modules, and implementation modules are all the critical design elements of the system. The input device, i.e., the Pi camera in the Raspberry Pi or the webcams in the input modules acquire the facial pictures of various individuals. The collected pictures and coding scripts are forwarded to the computing modules (here it is a Raspberry Pi board) from the input modules. The features
Fig. 1. The proposed door lock framework
28
J. Paul et al.
of an individual are extracted using Haar Feature Cascade Classifiers and kept in the database of the enrollment module. The database is thus built by capturing the faces of the authorized users of the home. For verification, the test image features are compared with the image features of the authorized users, already saved. If the test image matches the image in the database, the relay switch is set to unlock the door in the module. Otherwise, the input module captures another image (test image), and the procedure is repeated. Nonetheless, a security warning is triggered after a certain number of failures. Figure 2 shows the proposed h/w system. In this design, Raspberry Pi 3 Model B+ is used having 1.4 GHz, 2.4 GHz, and 5 Ghz wireless LAN, Bluetooth 4.2/BLE, and faster Ethernet, respectively. Raspberry Pi 3 Model B+ has PoE functionality with a different PoE HAT, 64-bit quad-core processor. The TensorFlow, Keras, Opencv, Haar Cascades Face detection library are employed for the design of the system.
Fig. 2. Low-computation IoT system design on Raspberry Pi modules
4
Haar-Cascade EuFACENET with Fully Connected Face Authentication (HFFCFA)
The architectural pipeline consists of three main processes - face detection, facial feature extraction, and face recognition. The framework starts with capturing the photo using the Raspberry Pi camera (800 * 600 pixels). The classifier of Haarcascade is utilised to detect the face, and facial features are extracted to identify the face. In this process, dimension is significantly reduced from the original image. Let α be the acquired image having 800 * 600 pixels, and ω represents the Haar-cascade model applied on α to produce a two-dimensional vector, say β. β represents the face image, resized to 160 * 160 pixels for neural network processing. The second stage uses a low-end neural network euFACENET model, denoted by η, which considers β as input for processing. The euFACENET model
Face Recognition Using Deep Learning Algorithm
29
Fig. 3. Haar-cascade FACENET Fully Connected Face Authentication (HFFCFA) method
generates a 128-d vector, say μ as the output image. The 128-d vector is the feature vector used to classify the face. These two procedures are the foundations for the recognition of the face. The pre-trained model is used to produce face feature embeddings as well as face detection components. The third step of the suggested development of the system is the facial recognition portion. The network flow is shown in Fig. 3. We use x numbers of authorized face identity photos to build a new neural network model. Assume we have y number of faces for each authorised individual. So, there is total of Z = (x * y) faces in the dataset with the appropriate label, i.e. x(Every person’s distinct identity) [11]. We consider a two-layer feed-forward(fully connected) neural network model where x is the number of output nodes in the last layer. Each individual’s unique ID is a one-zero indexed display, represented using a onehot x-size vector. The network is called Fully Connected Face Authentication Network (FCFAN). All Z images are applied sequentially to the ω and η models to generate individual feature maps. The database consists of Z-feature map, represented with the one-hot vector, denoting label for training and test data. This network has an input matrix of size: Z * 128 and an output matrix of size: Z * x. The
30
J. Paul et al.
dataset has been divided into training, testing, and validation set. By applying batch normalization in the back-propagation algorithm with cross-entropy loss function, we obtain the parameters of FCFAN, learned using the training dataset. The network is used to predict the test face after training the FCFAN using n authorized users. During testing ith output node of the neural network should be close to 1, and the remaining nodes are close to 0 for recognizing the face of the person as an authenticated individual, denoted by the ith node. This minimizes false-positive scenarios where non-authenticated identical faces have a chance to be projected as an authentic individual. The person is deemed authorised when the last node percentage is equal to or higher than 0.9.
5
Experimental Analysis Environment
In this section we discussed the performance evaluation as well as the network hyper-parameters. 5.1
Network Parameters
The FCFAN has input and hidden nodes of size 128 and 512, respectively. The output layer has n number of nodes, where n is referred as the number of authorised persons. A batch size of 32 is used to train the network. The Adam optimizer is used for the back-propagation algorithm with a learning rate 0.001. The dropout 0.25 is used between the layers for regularization. 5.2
Performance Evaluation: Preliminary Match (Basic Match)
We examine the effectiveness of the HFFCFA model for preliminary matching when the camera captures frontal face image, authenticated individual. The distance from a person to the camera does not exceed 2 ft. The background light is ambient and noise-free. Table 1 presents the experimental outcomes for Euclidean-based FACENET (euFACENET) and HFFCFA models. The name of the authenticated individual is shown in the first column. The condition of the environment for experimental analysis is shown in the second column. The third column displays whether or not the personal data is in the database. The fourth column describes the outcome of euFACENET and the sixth for HFFCFA. The match is found between the individual X and the database while ‘None’ represents in case no match occurs. The results of the proposed system are summarized below. 1. True positive: The person’s authentication data is contained in the database and is recognised by the methodology. 2. False positive: The person’s authentication data is not contained in the database, but the proposed methodology recognises the person. 3. True negative: The person’s authentication data is not in the database and is not recognised by the proposed methodology.
Face Recognition Using Deep Learning Algorithm
31
Table 1. Experimental analysis of euFACENET and HFFCFA Name
Environment for experimental analysis
Person data in DB
euFACENET
HFFCFA
Determined Final as result
Determined Final as result
Amit
Sufficient light
Yes
Amit
Correct Amit
Correct
Amit
Insufficient light
Yes
None
No
Correct
Dhiraj
Without glasses
Yes
Dhiraj
Correct Dhiraj
Correct
Arjun
Sufficient light
Yes
Arjun
Correct Arjun
Correct
Disha
Sufficient light
Yes
Isha
Wrong Disha
Correct
Sujit
Sufficient light
No
None
No
No
Arpan
Sufficient light
Amit
None
No
None
No
None
No
Susmita Insufficient light
No
None
No
None
No
Arjun
No
Hari
Wrong None
No
Insufficient light
Fig. 4. Experimental analysis of euFACENET individual’s identification when (a) Available in the database (b) Missing from the database
Fig. 5. Experimental analysis of HFFCFA individual’s identification when (a) Available in the database (b) Missing from the database
4. False negative: The person’s authentication data is contained in the database, and the proposed methodology does not recognise the person.
32
J. Paul et al.
The euFACENET experimental results are summarised in Fig. 4. Figure 4a displays the success/failure rate whenever an individual is selected from the database for recognition. The percentage of success/failure for an individual not saved in the database is shown in Fig. 4b. Figure 5 demonstrates the outcomes for the HFFCFA algorithm. The HFFCFA model outperforms the euFACENET model and reduces both false-positive and false-negative cases. 5.3
Performance: With Variation in the Parameters (Camera and Face)
In order to assess the accuracy of face recognition by the HFFCFA model, the following parameters are evaluated. – a) Camera angles - The authorised individual’s face is positioned at the angle of the basic camera axis. – b) Mask on face - When an individual comes in front of the camera, then the individuals cover half face with the mask. Figure 6 illustrates success/failure rate of the euFACENET and the HFFCFA models by capturing individual face data at 45◦ camera angle and stored in the database. Figure 7, indicates the result when the training data of individual face is not present in the database. Figure 8 and Fig. 9 show the identical outcomes for variation in the parameters (b) that is a mask on the face.
Fig. 6. Experimental analysis using a 45◦ angle camera for the Available individual is from the database of the (a) euFACENET and (b) HFFCFA.
It has been observed that the proposed HFFCFA model for door-locking, developed on the Raspberry Pi platform performs better than the Euclidean based euFACENET system.
Face Recognition Using Deep Learning Algorithm
33
Fig. 7. Experimental analysis using a 45◦ angle camera for the missing individual is from the database of the (a) euFACENET and (b) HFFCFA.
Fig. 8. Experimental analysis using the wearing mask for the available individual is from the database of the euFACENET and HFFCFA.
Fig. 9. Experimental analysis using the wearing mask for the missing individual is from the database of the (a) euFACENET and (b) HFFCFA.
34
J. Paul et al.
Fig. 10. Confusion matrix using HFFCFA model
6
Conclusion
We have suggested in this specific paper hardware and software-based solution for the IoT base door lock system with low computation. The system develops a facial detection/recognition strategy by integrating FACENET and fully connected neural networks, called “Haar-cascade FACENET with Fully Connected Face Authentication” (HFFCFA). This HFFCFA increases face accuracy and minimises false positive and false negative cases over the conventional Euclidean euFACENET based model. The confusion matrix is shown in Fig. 10 for 14 authenticated test data using HFFCFA. Furthermore, HFFCFA improves recognition time with the variable parameters.
References 1. Bhoi, A., et al.: IoT-IIRS: Internet of things based intelligent-irrigation recommendation system using machine learning approach for efficient water usage. PeerJ Comput. Sci. 7, e578 (2021) 2. Bhoi, S.K., et al.: IoT-EMS: an internet of things based environment monitoring system in volunteer computing environment. Intell. Autom. Soft Comput. 32(3), 1493–1507 (2022) 3. Cho, H., Roberts, R., Jung, B., Choi, O., Moon, S.: An efficient hybrid face recognition algorithm using PCA and Gabor wavelets. Int. J. Adv. Rob. Syst. 11(4), 59 (2014) 4. Curtin, B.H., Matthews, S.J.: Deep learning for inexpensive image classification of wildlife on the Raspberry Pi. In: 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0082–0087. IEEE (2019) 5. He, K., Gkioxari, G., Doll´ ar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Face Recognition Using Deep Learning Algorithm
35
6. He, M., Zhang, J., Shan, S., Kan, M., Chen, X.: Deformable face net for pose invariant face recognition. Pattern Recogn. 100, 107,113 (2020) 7. Kanchi, K.K.: Facial expression recognition using image processing and neural network. Int. J. Comput. Sci. Eng. Technol. (IJCSET) (2013). ISSN 2229-3345 8. Kong, R., Zhang, B.: A new face recognition method based on fast least squares support vector machine. Phys. Procedia 22, 616–621 (2011) 9. Lal, M., Kumar, K., Arain, R.H., Maitlo, A., Ruk, S.A., Shaikh, H.: Study of face recognition techniques: a survey. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 9(6), 42–49 (2018) 10. Nayak, S.K., Pati, P., Sahoo, S., Nayak, S., Debata, T., Bhuyan, L.: Artificial finger with dental alginate impression material can fool the sensor of various finger print systems. J. Indian Acad. Forensic Med. 41(1), 2–6 (2019) 11. Paul, J., Bhowmick, R.S., Das, B., Sikdar, B.K.: A smart home security system in low computing IoT environment. In: 2020 IEEE 17th India Council International Conference (INDICON), pp. 1–7. IEEE (2020) 12. Paul, J., et al.: Evaluation of face recognition schemes for low-computation IoT system design. In: 2020 24th International Symposium on VLSI Design and Test (VDAT), pp. 1–6. IEEE (2020) 13. Quadri, S.A.I., Sathish, P.: IoT based home automation and surveillance system. In: 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 861–866. IEEE (2017) 14. Quintal, K., Kantarci, B., Erol-Kantarci, M., Malton, A., Walenstein, A.: Contextual, behavioral, and biometric signatures for continuous authentication. IEEE Internet Comput. 23(5), 18–28 (2019) 15. Raju, K., Rao, Y.S.: Real time implementation of face recognition system on Raspberry Pi. Int. J. Eng. Technol. 7(2.17), 85–89 (2018) 16. Sajjad, M., Nasir, M., Ullah, F.U.M., Muhammad, K., Sangaiah, A.K., Baik, S.W.: Raspberry Pi assisted facial expression recognition framework for smart security in law-enforcement services. Inf. Sci. 479, 416–431 (2019) 17. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015) 18. Shah, D., et al.: IoT based biometrics implementation on Raspberry Pi. Procedia Comput. Sci. 79, 328–336 (2016) 19. Sharif, M., Shah, J.H., Mohsin, S., Raza, M.: Subholistic hidden Markov model for face recognition. Res. J. Recent Sci. 2277, 2502 (2013) 20. Sun, X., Wu, P., Hoi, S.C.: Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299, 42–50 (2018) 21. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to humanlevel performance in face verification. In: IEEE Computer Vision and Pattern Recognition (CVPR), vol. 5, p. 6 (2014) 22. Tapia, J.E., Perez, C.A., Bowyer, K.W.: Gender classification from the same iris code used for recognition. IEEE Trans. Inf. Forensics Secur. 11(8), 1760–1770 (2016) 23. Turk, M.: Eigenfaces and beyond. In: Face Processing: Advanced Modeling and Methods, pp. 55–86 (2005) 24. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004) 25. Zhao, C.W., Jegatheesan, J., Loon, S.C.: Exploring IoT application using Raspberry Pi. Int. J. Comput. Netw. Appl. 2(1), 27–34 (2015)
Vehicle Number Plate Recognition System Ibidun Christiana Obagbuwa(B) , Vincent Mohale Zibi, and Mishi Makade Department of Computer Science and Information Technology, Sol Plaatje University, Kimberley, South Africa {ibidun.obagbuwa,201902877,201902668}@spu.ac.za
Abstract. Number plate recognition is a very interesting and demanding research area from previous years. Number plate recognition utilizes optical character recognition on images to recognize and read number plates of cars in text files. The number plate recognition system is very useful in many vehicles security-related departments. Traffic police use the system to obtain details of vehicles violating traffic rules. It is also used for automatic toll collection systems and car parking systems. The system needs to capture the number plate before doing any recognition. The cameras used to capture the number plates make use of infrared lighting to capture at any time of day. Flashes are also used to ensure the accurate capturing of number plates. This work implements vehicle number plate recognition in matlab. Keywords: Vehicle number plate recognition · Matlab · Image processing · Image segmentation · Image recognition · Cameras · Traffic control · Parking lots
1 Introduction The Number Plate Recognition system is a technology for automatically reading vehicle number plates. Number plate recognition is a chunk of digital image processing that is usually utilized in the vehicle transportation system to recognize the vehicles [1, 2]. It is used by police agencies all around the world to verify if a vehicle is registered or licensed, among other things. It is also utilized to manage traffic on roadways, gas stations, retail malls, airports, motorways, toll booths, hotels, hospitals, parking lots, and defense military checkpoints, among other things. This work becomes more difficult since the number plate might be recorded against a variety of backgrounds with varying typefaces, angles, and sizes, and different nations have distinct license plate layouts [3–5]. The vehicle number plate system is used nearly everywhere, including traffic maintenance, locating stolen automobiles, tracking down criminals, and toll collection. Vehicle number plate has been tried and implemented for many years, yet it remains a challenging task. The system looks for local patches in an image to identify the license plate. The number plate can be anywhere in the input image with different sizes, so it is not wise to look at every pixel of an image to find it. The Number Plate Recognition (NPR) system obtains the picture using spectral analysis, extracts the region of interest, and character © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 36–42, 2022. https://doi.org/10.1007/978-981-19-3089-8_4
Vehicle Number Plate Recognition System
37
Fig. 1. NPR process [5]
segmentation using the support vector machine (SVM) feature extraction technique [3]. The NRP process is depicted in Fig. 1. This study aims to implement vehicle number plate recognition in MATLAB. This paper has the following structure: Sect. 2 is based on image pre-processing. Section 3 Extraction of number plate location, Sect. 4 Segmentation and recognition of plate character, Sect. 5 MATLAB result, Sect. 6 applications of NPR system, and Sect. 7 conclusion.
2 Image Pre-processıng The image pre-processing has 6 sub-sections. 2.1 RGB to Gray The pictures are transformed to grayscale using the method shown in Eq. 1: create a weighted sum of the red, green, and blue components:
0.2989 ∗ R + 0.5870 ∗ G + 0.1140 ∗ B
(1)
38
I. C. Obagbuwa et al.
The RGB to Gray process is needed because from the grayscale image we obtain the binary image/black and white image [1]. 2.2 Grayscale to Binary Image Binary images are obtained from Gray-level images by thresholding. The thresholding function is shown in Eq. (2): b(x,y) = 1; if g(x, y) < T 0; if (g(x, y)) >= T
(2)
Binary images have a very small size, and they are simple to process and analyse. They consist of 0 and 1. They are black-and-white photographs. 0 stands for black and 1 stand for white [3, 4]. 2.3 Noise Removal by Median Filter Then using ‘medfilt2()’ function, we can remove the noises. The ‘medfilt2()’ function requires two input arguments. They are: • The noisy image • The size of the filter 2.4 Histogram Equalization We use the ‘histeq’ function to perform histogram equalization. This function takes the source image as an argument and returns the equalized image. 2.5 Edge Detection Are there sudden changes of discontinuities in an image? It helps with the detection of the number plate size. The operator used is the Prewitt operator. It is used to detect edges horizontally and vertically [6]. It computes the approximations of the derivatives for vertical and horizontal changes using two 3 * 3 kernels convolved with the picture [6]. 2.6 Morphological Operations • To add pixels to the borders of objects in a picture, use the dilation operation (horizontal and vertical). • Erosion is a technique for removing pixels from the edges of an object. Each picture pixel corresponds to the value of the pixels in its immediate vicinity. You may build a morphological operation that is sensitive to certain forms in the input picture by specifying the shape and size of the neighbourhood pixel. Morphological operations take an input picture and apply a structuring element called ‘strel’ in MATLAB to create an output image of the same size [7].
Vehicle Number Plate Recognition System
39
2.7 Filling Holes and Clearing Borders In the syntax fill holes using ‘imfill‘ in the black and white image, a hole is a collection of background pixels that is not reachable by fill up the background from the image edge. By using the imclearborder (I) syntax, the structures in the image(I) are clearer than their surroundings and are related to the border of the suppressed image [8]. The following sections use candidate region extraction to extract the number plate position.
3 Extraction of Number Plate Location The license number region is segmented out of the binary picture. From top to bottom, the picture is scanned row by row. The black-and-white image’s magnitude is stored in variables. The row with the greatest magnitude is the one with the number plate, which has been clipped. The cropped area’s content is matched to the templates.
4 Segmentation and Recognition of Plate Character This section consists of two sub-sections namely segmentation of characters and template matching. 4.1 Segmentation of Characters The type of segmentation that will be used is shrinking segmentation. It involves the use of bounding boxes to segment characters of the number plate using the bounding box property of MATLAB. Then the characters are extracted from the bounding boxes. This type of segmentation makes the segmentation process easy since the characters in a number plate are isolated. 4.2 Template Matching Templates of alphabets are created in the beginning. Figure 2 shows the templates used.
Fig. 2. Templates used
The correlation function shown in Eq. (3) is then used to match the segmented characters to the templates. The correlation values are kept in a variable. The character with the highest correlation value is assigned the variable/alphabet [6]. The recognized
40
I. C. Obagbuwa et al.
characters are kept in a text file. The correlation values are computed as shown in Eq. (3): y−y 1 x−x (3) r= n−1 sx sy where r = Correlation coefficent of (x1 ,y1 ), (x2 ,y2 ), (x3 ,y3 ),……… (xn ,yn ) observations. n = Number of observations x = mean of x1 , x2 , . . . ..xn observations, y = mean of y1 , y2 , . . . ..yn observations, sx = total observations of x sy = total observations of y
5 MATLAB Results This study created an NPR system using MATLAB and results from the NPR system are shown in Fig. 3 along with all the pre-processing results. The dataset for this system is the image of vehicle plate number. It can be tested with the image of vehicle with plate numbers from any country in the world. The original data (image of vehicle with plate number) is loaded into the NRP system. The data is processed and analyzed using the following steps which is also depicted in the Fig. 3. • • • • • • • • • •
Read the color image into MATLAB. The RGB image is converted to a gray image. Noise removal by the median filter. Image enhancement by using Histogram Equalization. Edge detection using Sobel. Dilation Operation (horizontal and vertical) to increase the number of pixels around the edges of things in a picture. A group of background pixels that cannot be reached by filling in the backdrop from the image’s edge is referred to as filling a hole. Suppressing structures in the picture ‘I’ that is sharper than their surroundings and are associated to the border of the image by clearing the border. Morphological Erosion Operation to remove pixels on the image boundaries. Using segmentation algorithms, extract each character and number from the picture. The number plate extracted from the car is shown in the Fig. 3 at the bottom.
6 Applications of NRP System The NPR system can be used on the following: • Parking – It is used to automatically capture prepaid members and calculate nonmember parking fees. • Access control – It is used to keep out unauthorized drivers and automatically open the gate for authorized users.
Vehicle Number Plate Recognition System
41
Fig. 3. Results from NPR system
7 Conclusion This work shows how an NRP system can recognize and extract number plate from a vehicle. The creation of the NRP System was carried out in MATLAB. Acknowledgement. The authors would like to thank Sol Plaatje University for providing infrastructure for this work.
References 1. Vijayalakshmi, N., Sindhu, S., Suriya, S.: Automatic vehicle number recognition system using character segmentation and morphological algorithm. In: 2020 IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering (ICADEE), pp. 1–5. (2020). https://doi.org/10.1109/ICADEE51157.2020.9368901 2. Shinde, S.A., Hadimani, S.S., Jamdade, S.S., Bhise, P.R.: Automatic number plate recognition system. Int. J. Sci. Res. Eng. Trends 7(4), (2021) 2395-566X 3. Çavdaro˘glu, G.C., Gökmen, M.: A character segmentation method to increase character recognition accuracy for Turkish number plates. Math. Comput. Sci. 6(6), 92–104 (2021). https:// doi.org/10.20944/preprints202104.0440.v1
42
I. C. Obagbuwa et al.
4. Chandra, B.M., Sonia, D., Roopa Devi, A., Yamini Saraswathi, C., Mighty Rathan, K., Bharghavi, K.: Recognition of vehicle number plate using Matlab. J. Univ. Shanghai Sci. Technol. 23(2), 363–370 (2021) 5. Chauhan, S., Srivastava, V.: Matlab based vehicle number plate recognition. Int. J. Comput. Intell. Res. 13(9), 2283–2288 (2017) 6. Tiwari, B., Sharma, A., Singh, M.G., Rathi, B.: Automatic vehicle number plate recognition system using Matlab. IOSR J. Electron. Commun. Eng. 11, 10–16 (2016) 7. Ch.mathworks.com: Types of morphological operations-MATLAB & Simulink-MathWorks Switzerland. https://ch.mathworks.com/help/images/morphological-dilation-and-erosion. html. Accessed 05 June 2021 8. Ch.mathworks.com.: Fill image regions and holes - MATLAB imfill- MathWorks Switzerland. https://ch.mathworks.com/help/images/ref/imfill.html. Accessed 16 June 2021
An Approach to Medical Diagnosis Using Smart Chatbot Shreya Verma, Mansi Singh, Ishita Tiwari, and B. K. Tripathy(B) School of Information Technology, Vellore Institute of Technology, Vellore 632014, India {shreya.verma2019,mansidhananjay.singh2019, ishita.tiwari2019}@vitstudent.ac.in, [email protected]
Abstract. People avoid initial hospital treatment that could turn into a major disease in the future. With rapid changes, remote or home-stationed diagnosis systems are becoming exponentially popular, featuring ample number of advantages like cost-efficiency, quick and authentic decision support for medical diagnostics, treatment and prevention of any physical or mental damages. Our proposed idea is to create an affordable and accessible all-day chatbot- DiagZone for people to check on their health in contrast to the conventional way of higher time investments. Since it is free and provides mobility with the capability of being accessible anywhere irrespective of the user’s location, it encourages practical utilization. It saves on unnecessary time required in specialist consultations. With the help of AWS Lex, AWS Lambda and channel Twilio, we have integrated our Flask based Chatbot to Whatsapp. Text entered by the user is processed with the pivotal implementation of NLP. After DiagZone has gathered adequate keywords from the original messages, it will start to lead the conversation by interrogating the user and trying to list a few diseases that the user may suffer from. Once the Chatbot has detected the probable disease, it works effectively by suggesting redressal measures and medications to the user or by referring them to a doctor. Keywords: AWS Lex · Lambda · Twilio · NLP · Flask · Whatsapp
1 Introduction A medical chatbot system has a profound influence on state health culture. It has exponentially advanced and is less susceptible to any possibility of a manual error. People today are at a greater risk of becoming addicted to the Internet but are devoid of the concern of having good mental and physical health. They try not to go for hospital treatment for trivial matters which in turn could cost them a major disease in the future. Our proposed bot methodology aims to solve this problem, to create an affordable and accessible all-day chatbot. The fact that chatbot is free and remotely accessible encourages the user to own and use it. This makes it effective to save on the essence of consultation with specialist doctors. After the Chatbot has collected enough keywords from the original messages, it will now start to lead the conversation by interrogating the user and thereby exerts to list a few diseases that the user may suffer from. As soon as it receives enough © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 43–56, 2022. https://doi.org/10.1007/978-981-19-3089-8_5
44
S. Verma et al.
data, it discerns a disease or an injury that the user might be suffering from. This works out once the Chatbot has detected a user or by referring the user to a doctor if the rate strikes a preset limit number. Working out on these lines, we have named the Chatbot as DiagZone, a person’s complete zone for diagnostic purposes. There are several methods to make medical diagnosis. These are in various directions and with different technology. Using rough set theory [14] and formal concept analysis in [15, 17] using hybrid models of fuzzy set and soft set in [16]. This paper aims to introduce the design of our Chatbot which can be used in the fields of medical diagnosis by providing diagnostics and remediate measures based on the indicators provided in the application. Natural Language Processing (NLP) techniques such as Python NLTK can be used to analyze speech, and clever answers can be obtained by designing an appropriate human response engine. Health care negotiations have a high potential for medical communication by improvising and bridging the clinic-patient and physicianpatient gap. It can assist in meeting the need for health services through remote testing, vigil and swift medication, or even telephonic consultation. The idea has been devised in lieu of the current prevailing pandemic times wherein a remote consultation can help people to avoid the risks they might have to endure while visiting a hospital.
2 Literature Review Using AI based concepts, a diagnosis confined Chatbot was devised to help users identify illness based on their symptoms and was proposed by Fernandes et al. [1]. It was fed with a knowledge base that had diseases and NLP made it capable to transform the user-fed inputs as a part of their queries and ultimately gave their desired response. The response time for this was 10-20ms. These retrieve knowledge from databases consisting of approximately 150 diseases to develop the diagnostic chatbot. Results show that it can be used for retrieving information in devices like Alexa and Siri. A generic chatbot called ‘Diabot’ – a (DIA)gnostic chat (BOT) was proposed by Sahoo et al. [2]. Using NLU, chatbot has a direct conversation with the patient to get personalized predictions using the dataset stored based on the symptoms taken as inputs from the user. It was then used for specialized Diabetes prediction using Pima Indian diabetes dataset. Its front-end was made using RASA NLU based text pre-processing to understand what the patient wants, React UI and undertook quantitative performance comparison of an ample number of machine learning algorithms. Among asthenic learners, the accuracy of the model showed counterbalanced results and prediction showed that diabetes prediction was above par. To show the use of chatbots which can be used in the diagnosis of Achluophobia a chatbot was introduced by Belfin et al. [3]. It is based on the fact that it can diagnose the severity of disease based on the patient’s inputs. It performs NLP for extraction of the meaning and uses DT to sort down to its classifications to be able to characterize and classify a user in terms of a likely disease. The NLP unit retrieves the meaning of keywords to define the severity of the disease’s symptoms. Pattern matching of sentences to determine the similarities is done. The DFS technique is used to traverse DT and to help make crucial decisions about the intensity of the disease. In diagnosing Autism and Achluophobia, Aquabot showcases high efficacy. Even when it comes to assisting
An Approach to Medical Diagnosis Using Smart Chatbot
45
a human psychologist, the proposed model is beneficial for practitioner psychologists. These features ensured that it turned out to be time-efficient and resourceful and even succeeded at achieving an accuracy of 88 percent when measured against the results achieved by a human psychologist. Uncertainty based models are more efficient than the crisp methods [14]. To make a Chatbot for the Covenant University Doctor (CUDoctor), a telehealth system based on fuzzy inference and logic rules was developed [4]. The system aimed to put its focus over symptoms of Nigeria’s tropical diseases. To create the interconnection between the chatbot and the system, Telegram Bot API was used. Twilio was utilized for interconnectivity between the system and an SMS subscriber. Using medical ontologies, the system uses the knowledge base to predict the illness and its severity based on the symptoms received. A fuzzy Support Vector Machine (SVM) is used to predict the disease based on the symptoms received. They are identified by NLP and sent to CUDoctor to gain decision support. User then receives a message which indicates the completion of the process. The system behaves as a mini medical diagnostic center that gives personalized diagnosis based on self-given inputs. The usability was assessed and capitulated a mean SUS score of 80.4, showing positive metric evaluation. To identify & forecast the presence of a heart disease in an effective and accurate manner, a system was developed by Omoregbe et al. [5]. Based on surveys conducted on algorithms like K-NN, Artificial Neural Network (ANN), SVM, Bayes, Decision Tree (DT) etc., tests were conducted to check for veracity of all. It was seen that the SVM algorithm gives the best possible certainty on a Heart Disease Dataset against others when analyzed from the survey. SVM training algorithm constructs a model that allocates new instances to one or the other category and is a classifier defined by the separation of a hyperplane. We aim to construct a system that can get the reports of the patient, conduct an analysis and thereby infer whether he/she is suffering from any kind of heart disease or not. The process is carried through a one-on-one conversation pattern using Diagflow Platform. The chatbot developed by Mujeeb et al. [6], can give patients a realistic experience of conversing over text with a Medical Professional. The Chatbot showed its capability to identify and store message patterns as followed by humans using AI Mark-up Language. This language, based on XML, has extended its scope to retrieve and build applications based on AI. It extracts keywords as per the knowledge base from the initial messages to realize the probable medical issues that the patient might have, from the inputs received. Few Medical Chatbots currently exist on similar lines. They connect users with a Medical Question-Answer Platform and under the suggestions tab, show similar previously answered questions by doctors that might have a match with the symptoms the user has entered. The application was compared against HealthTap, Facebook Messenger Chatbot. It was then able to justify its goal of being a medical Chatbot that could prove to be a better and efficient substitute to other existing Chatbots in this sector. A WhatsApp based service called XraySetu was presented by Bali et al. [7] to provide an instantaneous diagnosis of patients who are susceptible to/ showcased symptoms of COVID-19 by the use of Chest X-Ray imaging technology. A report was generated and consisted of predictions for multiple lung abnormalities including COVID-19 amongst 15 others with clarified semantic markings on chest X-Ray. It was undertaken to help
46
S. Verma et al.
the doctors to understand the gravity of the ailments and is then put to training using Multi-Task Learning on chest X-Ray datasets provided by sites such RSNA and NIH. A miniscule level pilot was developed and seen over an extensive period of 10 months and could accept a machine generated X-ray Report on sending the concerned authorities chest X-Rays of patients. The obtained Xraysetu showed an efficient mechanism for the usage of occupied doctors since they could plan an early intercession for their patients by just clicking a snapshot of their Xray and sending it to them via WhatsApp. A chatbot which focuses on cancer patients is proposed by Kudwai et al. [8] which would be equipped with the ability to answer all of their queries pertaining to treatment, survivals or symptoms. There was no limited dataset used since they extracted information from various platforms. Another feature that they aimed to include was sentiment analysis in order to provide a more comforting experience. Hence, this chatbot was an implementation of NLP, Web Scraping, Neo4j. The data preprocessing was done by NLTK. Though experimentally it proved to be a reliable platform, expanding the dataset to help improve the performance along with the inclusion of images in the dataset would be a future scope. Including features like speech recognition and suggesting specialized hospitals were proposed as part of it too. A chatbot to reduce the time and efforts required in getting a reliable diagnosis by implementing a chatbot. The framework to be used was RASA. The chatbot would diagnose and provide treatment recommendations [9]. It would be able to provide a health analysis and track the nutrients. The chatbot would be able to provide the information regarding their macronutrients by asking the users to enter the food that they had throughout the day. The use of conversational AI to interact with the users in an efficient/operating system and architecture supported future integrations of features and ease of updating. A health assistant that can handle quick seeking treatments using the PRISMA method is proposed by Tjiptomongsoguno et al. [10]. They used NLP, ML, Braun and Clarke’s algorithm, compared keyword and data mining and inferred that NLP and ML make the best combination for a medical diagnostic chatbot. Based on the needs, several technologies bridge to serve accurate results. However, the question still remains of when and how a machine can predict fully correct output. In [11, 12], a chatbot named AgentG is proposed for the lovers of e-commerce who can chat through it in a friendly manner. Similarly, another system which is based upon the clustering and auto-tagging techniques is proposed and studied in the name of TagIT [13]. Tripathy et al. [14] proposed a measure using fuzzy logic, neurocomputing and probabilistic reasoning to explore methodologies to exploit the tolerance of imprecision and uncertainty to achieve robustness and low solution cost.
3 Experimental Setup The experimental setup for DiagZone is shown hereby:
An Approach to Medical Diagnosis Using Smart Chatbot
47
3.1 Software Using the knowledge base of diseases and symptoms acquired from medical ontologies and by using pattern detection methodologies, we set a goal to devise an integrated application and wish to derive our results in the form of an identifying tool that provides a diagnosis by identifying inputs inserted by the users. Using AWS Lex Chatbot service and Channel Twilio, this is then integrated to WhatsApp using a business account providing dataset through AWS Lambda function and constantly refining it. We used Twilio due to its encryption-based support system that facilitates communications and secure mechanisms. 3.2 Hardware We just require a smartphone and good internet connectivity to use this chatbot efficiently. Our chatbot, when launched by the user, processes the query entered by the user. On successful processing of the same, the chatbot retrieves solutions apt for the condition. These solutions are then presented to the user and the workflow arrives at the end. In case the query is not processed successfully, error feedback is provided to the user. 3.3 Security Checks The base necessity for any chatbot is the assurance of a good security check so as to ensure that no vulnerabilities can penetrate it henceforth. • It ensures that confidential data such as customer’s information is accessible to Twilioowned devices only. • It consists of data loss prevention system scans for sensitive data that might be susceptible to being exposed publicly or improperly stored. It additionally has alerting and quarantine capabilities for our primary collaboration systems. • Data parsing and classification is followed by Twilio through a sequence of four priority steps- Secret, Restricted, Confidential and Public. The data segregation here is implemented as logical separation and its access mechanism follows the principle of least privilege. • The credibility is ensured using AWS certifications. Through this, Twilio can maintain the confidentiality, availability, and integrity of our data which is extremely necessary while handling patient details owing to their highly personal nature along with the services while maintaining compliance with necessary requirements of the legislative, contractual and regulatory forms. • Amazon Lex, it is observed that security is implemented through technical and physical controls, that involves encryption at rest and in transit, designed to prevent unauthorized access to, or disclosure of our content. 3.4 Modules Description Our chatbot consists of certain chunks of functionalities that have been mentioned below:
48
1. 2. 3. 4. 5.
S. Verma et al.
Decoding the natural language messages Detection of diseases from symptoms Suggest remedial measures for the same Locate doctors/helpline for the user Book appointments for the user and provide confirmations
The blended dialogue at the heart of its framework consists of Natural Language Processing, a part of Artificial Intelligence and ML. An AI-enabled chatbot determines and processes language that is understandable to the target person. It understands the minimum number of personal conversations and recognizes that instructions or questions made by users do not require much specificity.
Fig. 1. System architecture
In our proposed system as shown in Fig. 1, we proceed with a functionality of identification of text-based keywords and meaningful extraction to create sympto-mapping or matching the symptoms with the knowledge base. The relevant queries presented by the users will be tallied from the database and suggestions for possible major or minor disease along with referrals to a doctor will be presented. This will be done as follows: a. Decoding Intent: NLP will break down users’ inputs and comprehend the positioning, conjugation, etc. that a human conversation may be like. b. Recognizing Utterance: The chatbot is capable of recognizing instances from the words given, the user is referring to. c. Dealing with Entity: Chatbot will trace to the database to retrieve the most similar possible result. d. Contextual Understanding: NLP also provides conversation, tone and sentimental analysis for more human-like language processing.
An Approach to Medical Diagnosis Using Smart Chatbot
49
To train our dataset for pattern-based matching and identifying diseases, we use decision tree classifiers and algorithms that give us a predictive approach. The supervised learning algorithm as such will work on N training examples such that a feature vector with a label will contain an input and an output space for learning and predicting respectively. Further text processing operations will include noise removal: erasure of irrelevant characters in the string and related subjects, tokenization: fragmentation of string into lexical elements using NLTK, tagging of document: useful knowledge source, parser: in-case of speech-to-text, term-matching: extracting matching keywords and tallying to knowledge base, feature selection and extraction: the feature vector will come into play where we will look towards giving the output. 3.5 Algorithm The expected inputs and outputs are shown here with the help of a devised pseudocode: Step 1: Start Step 2: Parse query in text box eg: (I am having stomachache and headache, best relieving medicine quickly) Step 3: Pre-process, words like stomachache, headache, medicine, quickly are highlighted Step 4: The information is stored in logs. Step 5: Matching the keywords are fetched from the Knowledge base, the AWS Lex performs its functions and response is generated. Step 6: The output is thus presented to the user eg: (In case of mild, take a crocin or paracetamol). Step 7: Exit
Fig. 2. Whatsapp integrated with AWS Lex via Twilio
50
S. Verma et al.
Process of setting up the chatbot as shown in Fig. 2 can be summarized as: • • • •
User interacts with the Chatbot via Whatsapp. Amazon Lex (AI powered chatbot) integration with WhatsApp via channel Twilio. Create the Lambda function to initiate the outbound API call to Amazon Connect. Lambda Function runs the business logic deriving parameters from DynamoDB to perform knowledge base matching for response. • User when parsing the query receives a response on Whatsapp itself. Followed by this, relay takes place.
4 Dataset Description We provide the necessary solutions based on inputs received from the user’s side, thereby saving up their time as well as consultation fee expenses. This is done with the help of a dedicated team that constantly updates any new symptoms/diseases which need to be taken into account for the portal. It provides the users with a platform that is neither outdated nor time-consuming or inefficient. The dataset Disease Symptom Prediction with 9.7 usability score is obtained from Kaggle. • Dataset for symptoms: Columns: 18 Rows: 4981 Column Labels: Disease, Symptom_1, Symptom_2, Symptom_3, Symptom_4, Symptom_5, Symptom_6, Symptom_7, Symptom_8, Symptom_9, Symptom_10, Symptom_11, Symptom_12, Symptom_13, Symptom_14, Symptom_15, Symptom_16, Symptom_17 File Size: 632.2 kB Used for predicting symptoms related to a particular disease further classified on the basis of severity. • Dataset for severity: Columns: 2 Rows: 134 Column Labels: Symptom, weight File Size: 2.33 kB For overlapping symptoms, high severity to least shown during user output request.
An Approach to Medical Diagnosis Using Smart Chatbot
51
• Dataset for Description: Columns: 2 Rows: 42 Column Labels: Disease, Description File Size: 11.03 kB To provide users with the detailed definition of the disease predicted. • Dataset for precaution: Columns: 5 Rows: 42 Column Labels: Disease, Precaution_1, Precaution_2, Precaution_3, Precaution_4 File Size: 3.49 kB As soon as all the Datasets shown as samples are loaded the csv files are loaded to DynamoDB using object creation through a bucket and Lambda Function Programming to import the csv file. Data is loaded to DynamoDB and using lambda function is integrated with lex bot.
Fig. 3. Sample chat between User and Bot Salli
52
S. Verma et al.
5 Result Analysis Considering a high amount of population nowadays, everyone has adapted themselves to the virtual world and are comfortable in using Whatsapp as an immediate mode of communication. As our chatbot gives the desired comfortable environment with the help of end-to-end encryption, it lets the users maintain their privacy while discussing their symptoms with the virtual bot. We successfully implemented the bot predicting disease or disease like entities to the given set of symptoms user inputs and also provide precautionary measures. We also trained a set of nearest available doctors according to geolocation. The bot identifies and books the nearest possible doctor available at the earliest (sample conversation presented in Fig. 3). However, the appointment module is just for testing purposes since integration with clinics and hospitals still remains under review for accuracy and approval.
6 Comparative Analysis The following Table 1 illustrates a comparative analysis of some of the related Chatbots that we came across. These were then tabulated and performed accordingly. Table 1. Comparative analysis of a range of chatbots Chatbot
Functionality
Advantages
Disadvantages
Diagzone
An NLP-based chatbot that gives end-to-end diagnoses by accepting symptoms directly from the user via Whatsapp with Twilio as the channel. It parses the message to extract symptoms from the input. These symptoms are matched as keywords against the knowledgebase using business logic run by Lambda functions. AWS Lex then returns the generated response
The overhead computation time is low – providing a fast remote accessible service, owing to the cloud services being availed. Twilio and Whatsapp inherently add security, preventing third parties from accessing personal information
Due to the lack of real-world data from regional hospitals, not all the variables pertaining to geographic changes and their corresponding effects on individual’s health have been considered
(continued)
An Approach to Medical Diagnosis Using Smart Chatbot
53
Table 1. (continued) Chatbot
Functionality
Advantages
Disadvantages
Diabot
An NLU-based chatbot that combines an ensemble of five classifiers – Decision Tree, Bernoulli Naïve Bayes, Multinomial Naïve Bayes, RF, and SVM in the case of general disease prediction using the dataset – General health. Followed by that, the bot then narrows down to Diabetes prediction, an ensemble of six classifiers – NB, K-Nearest Neighbors (KNN), DT, RF, LR and GB are used on the dataset – Pima Indian diabetes
The generic framework implemented is used for various disease predictions and can be extended to develop more complex and integral disease-specific chatbots
Due to real-time implementation, accuracy becomes a major issue More supervised training required for specialized disease prediction beyond just the generic prediction
Aquabot
Proposed to strike a conversation with the user and find out the psychological problem. It deals with Autism and Achluophobia. The system processes entered text, then the decoded message is split into sentences then into phrases. Synonyms of phrases are taken from the knowledge base to generate new sentences. If found in the brain, it is served on the screen, else the user has to ask Aquabot on their own
Aquabot uses decision trees (DFS traversal) for its performance improvement. Usage of Dtrees improves the response time and regularity in responses. Exhibited a high accuracy of almost 88% when compared to a human physiologist’s diagnosis for Autism
Cannot save case history of physical disease currently No facial recognition, thereby hindering the identification of facial expressions of the user to identify the issues
(continued)
54
S. Verma et al. Table 1. (continued)
Chatbot
Functionality
Advantages
Disadvantages
Florence Chatbot
The chatbot finds the most likely disease by analyzing the symptoms given by the user and forecasts the disease using the extracted signs. The use of the RASA system has been achieved to incorporate this chatbot. Now, it can accurately diagnose patients with the analysis of the basic symptoms and a conversational approach
From symptoms, provides the most suitable disease to them along with the nutritional breakdown of the food that they have consumed. Gives near perfect results of almost 92% when compared to humans
Lack of image recognition facilities, thereby hindering any image that a user could send for diagnostics purposes and providing patient care
Graphican-Cancer Care Chatbot
User’s messages are forwarded to the bot and sent to a parser. Sentences are resolved into components and keywords are identified from it. The parser runs its natural language engine to perform searches. Intent and entities fetch data from the cancer database. Converted to a graph model using a graph database- Neo4j to process highly connected data and identify the relationship between different data. The engine shortlists cancer from input and gives remedies
Can impersonate the human conversation either through text or audio. Uses supervised learning concepts to learn from previous experience
No sentiment analysis being performed Lack of image recognition, thereby hindering any image that user could send for diagnostics purposes Limited data accessibility, cannot access data directly from the web
An Approach to Medical Diagnosis Using Smart Chatbot
55
7 Conclusion This project elucidates the natural language processing development in the healthcare industry using chatbots and how they are built. Current chatbots are high in performance and reliability when it comes to providing users responses as compared to the traditional systems. Taking into account the revolutionary demand of smartphones and ease of use as a handy device, users in time of need can contact the assistance tool for medical diagnosis. A telehealth platform requires response efficiency and cost-optimization for detection and prediction. Reliability plays a major role here since effective communication between patients (users) and a diagnosis system (doctor at proxy) holds on to the bridge of trust. This system provides an end-to-end diagnosis via self-input response from users, we conclude that it is a remote tool for SOS patients in less risk zones to get a diagnostic approach, and if doubtful, connect to a doctor. The proposed system combines NLP and cloud services with embedded machine learning algorithms and provides a connecting medium from AWS Lex and Twilio to create DiagZone.
8 Future Scope The chatbots are hoped to assist doctors in arriving at the right diagnosis and to provide these diagnoses as a single source of information by increasing the efficiency, accuracy and understanding of the human language. The futuristic blueprint of the remaining module implementation of our chatbot involves its ability to give out doctor details, ask the user (patient) about his/her convenient timings along with the location and cross check in the database whether it can schedule an appointment in that particular duration. It will ultimately give a confirmation of whether the appointment is scheduled or not. Through text-to-speech/speech-to-text or speech-to-speech method, the audio interaction will introduce more ease. This takes up only the usage of cloud services, hence reducing costs and the workload on medical doctors in rural areas as well as urban areas. Chatbots can be used to draw conclusions beyond one thought process by removing human biases.
References 1. Fernandes, S., Gawas, R., Alvares, P., Fernandes, M., Kale, D., Aswale, S.: Doctor chatbot: heart disease prediction system. Int. J. Inform. Technol. Electr. Eng. 9(5), 89–99 (2020) 2. Sahoo, S., Kakileti, S.T., Shah, R., Antony, M., Bhattacharyya, C., Manjunath, G.: COVIDSWIFT: simple Whatsapp based image interpretation at your fingertips. J. American Medical Assoc. 1(4), 1–16 (2021) 3. Belfin, R.V., Shobana, A.J., Manilal, M., Mathew, A.A., Babu, B.: A graph based chatbot for cancer patients. In: The Proceedings of International Conference on Advanced Computing & Communication Systems (ICACCS), vol. 3, 19 edn. IEEE, pp. 717–721 (2019) 4. Gupta, J., Singh, V., Kumar, I.: Florence- a health care Chatbot. In: International Conference on Advanced Computing and Communication Systems (ICACCS), Vol. 1, pp. 504–508. IEEE (2021) 5. Omoregbe, N.A.I., Ndaman, I.O., Misra, S., Abayomi-Alli, O.O., Damaševiˇcius, R.: Text messaging-based medical diagnosis using natural language processing and fuzzy logic. J. Healthc. Eng. 2020, 1–14 (2020)
56
S. Verma et al.
6. Mujeeb, S., Javed, M.H., Arshad, T.: Aquabot: a diagnostic Chatbot for achluophobia and autism. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 8(9), 39–46 (2017) 7. Bali, M., Mohanty, S., Chatterjee, S., Sarma, M., Puravankara, R.: Diabot: a predictive medical Chatbot using ensemble learning. Int. J. Recent Technol. Eng. (IJRTE) 8(2), 6334–6340 (2019) 8. Kudwai, B., Nadesh, R.K.: Design and development of diagnostic chabot for supporting primary health care systems. In: International Conference on Computational Intelligence and Data Science (ICCIDS), Procedia Computer Science, vol. 167, pp. 75–84 (2020) 9. Meijera, A., Websterb, C.W.R.: The COVID-19-crisis and the information polity: an overview of responses and discussions in twenty-one countries from six continents. Inform. Polity 25, 243–274 (2020) 10. Tjiptomongsoguno, A.R.W., Chen, A., Sanyoto, H.M., Irwansyah, E., Kanigoro, B.: Medical chatbot techniques: a review. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds.) CoMeSySo 2020. AISC, vol. 1294, pp. 346–356. Springer, Cham (2020). https://doi.org/10.1007/978-3030-63322-6_28 11. Srividya, V., Tripathy, B.K., Akhtar, N., Katiyar, A.: AgentG: a user friendly and engaging bot to chat for e-commerce lovers. Comput. Rev. J. 7(1), 7–19 (2020) 12. Srividya, V., Tripathy, B.K., Akhtar, N., Katiyar, A.: AgentG: an engaging bot to chat for e-commerce lovers. In: Tripathy, A.K., Sarkar, M., Sahoo, J.P., Li, K.-C., Chinara, S. (eds.) Advances in Distributed Computing and Machine Learning. LNNS, vol. 127, pp. 271–282. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-4218-3_27 13. Katiyar, A., Srividya, V., Tripathy, B.K.: TagIT: a system for image auto-tagging and clustering. In: Bhateja, V., Satapathy, S.C., Travieso-González, C.M., Aradhya, V.N.M. (eds.) Data Engineering and Intelligent Computing. AISC, vol. 1407, pp. 259–268. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0171-2_25 14. Tripathy, B.K., Anuradha, J.: Soft Computing- Advances and Applications. Cengage Learning publishers, New Delhi (2015).13: 9788131526194 15. Tripathy, B.K., Acharjya, D.P., Cynthia, V.: A framework for intelligent medical diagnosis using rough set with formal concept analysis. Int. J. Artif. Intell. Appl. 2(2), 45–66 (2011) 16. Mohanty, R.K., Sooraj, T.R., Tripathy, B.K.: An application of IVIFSS in medical diagnosis decision making. Int. J. Appl. Eng. Res. 10(92), 85–93 (2015). ISSN: 09734562, ICASME2015, Chennai 17. Tripathy, B.K.: Application of rough set based models in medical diagnosis. In: Das, S., Subudhi, B. (eds.) Handbook of Research on Computational Intelligence Applications in Bioinformatics, Chapter-8, pp.144–168. IGI publications (2016)
Performance Analysis of Hybrid Filter Using PI and PI-Fuzzy Based UVTG Technique B. Pavankumar1(B) and Saroj Pradhan2 1 Department of Electrical and Electronics Engineering, Sreenidhi Institute of Science and
Technology, Hyderabad 501301, India [email protected] 2 Department of Electrical Engineering, PMEC, Berhampur 761003, India [email protected]
Abstract. Power quality in distribution system is still one of the challanging task to achieve. In this paper, a hybrid conditioner is designed with an efficent controller to improve the power quality. The proposed conditioner consists of a low pass filter and a shunt active power filter. Low pass filter is used to mitigate higher order harmonics and the shunt active filter is used to mitigate lower order harmonics. The switching action of the shunt part of the filter is carried out with two different techniques. At first Proportional Integral (PI) approach along with unit vector template generation used to generate the reference signal and after that, PI controller is combined with fuzzy logic controller along with unit vector template generator (UVTG) is used to achieve accurate refence signals for proper operation of the voltage source inverter. The effectiveness the single and combined controller is verified through simulation. Keywords: PQ · Proportional Integral (PI) · Fuzzy logic control (FLC) · UVTG · Hysteresis current control (HCC)
1 Introduction In the past days, the distribution network was simple and the importance of the power quality was also ignored, but in the recent age, the use of power semiconductor components starting from domestic to industrial sector raises the concern over the power quality issues again [1–4], The power semiconductor components is treated as nonlinear type of load and draws nonlinear current from the supply results distorted voltage waveform at the point of common coupling, if proper filtering techniques is not provided [5]. The distorted voltage wave form treated as input to the other loads also and degrades its performance with respect to operation, control and efficiency, so it is crucial to filter out this unwanted component from the signal when nonlinear loads are connected. The unwanted signals are the harmonics components which are required to be filter out to achieve power quality in the system. It has been observed sometimes along with harmonics, inter harmonics and subharmonics also appeared in the system which creates many problems [6–8]. These parameters in the power system should be properly investigated © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 57–66, 2022. https://doi.org/10.1007/978-981-19-3089-8_6
58
B. Pavankumar and S. Pradhan
and analyzed, so that proper precautionary measure can be taken before starting of any disturbances in the system [9]. As per the standard of IEEE 519, the acceptable limit of the harmonics components in a signal which is measured in terms of total harmonics distortion (THD) should be below5 percentage of the fundamental [10, 11]. It indicates that, if the designed filter is able to filter out the harmonics in such a way so that the THD lies below the above percentage it works efficiently [13]. The effective operation of the designed filter depends on various factors [14]. Initially low pass filter was the only option for harmonics removal from a non linear signal but its large size and resonance problems leads to other alternatives from time to time, Active power filters and its different configuration is the latest development in the area but still research is going on for other alternatives [11, 12], for removing voltage harmonics and current harmonics, series and shunt configuration of filter is proposed[11, 12], though shunt connected filter is designed to suppress current harmonics but it is not suitable for higher ratings of the system due to more losses in the switching operation and require more cost [11], Hybrid filter is the alternative which can overcome this problem reported [11, 12], to enhance the performance of hybrid filter, research is going on and different control techniques are coming up to achieve the same[11]. A large number of classical control techniques such as synchronous reference frame, indirect current control technique, unit vector template generation etc. and different soft approach control technique such as fuzzy logic, neural network, fuzzy and neuro combination etc. are discussed [1–3] which are responsible for the generation of reference signal for the voltage source inverter. Once refence signal is generated with these methods, it is compared with the real signal results error and fed to either current control or voltage control-based technique for the generation of pulses, for further improvement of the performance of the voltage source converter the proper regulation of voltage is also important, once the link DC voltage remain constant, the accuracy level of the reference signal is increased to some extent which results better switching of the filter, different soft computing and optimization techniques such as proportional integral(PI), proportional integral derivative control (PID), Neural Networks (NN), Fuzzy logic control, Genetic algorithm (GA), Modified genetic algorithm (MGA), Jaya, Modified Jaya etc. are used [11, 15, 16]. In this work, a hybrid filter is designed to remove both higher and lower order harmonics in a signal which is coming due to a highly distorted non-linear load connected to the system. The designed filter is the combination of both low pass filter and shunt active power filter. The low pass filter is used to remove higher order current harmonics and the shunt active filter is used to eliminate the lower component harmonics currents. The performance of the hybrid filters depends on the control techniques implemented in the shunt active power filter. The controller suggested for efficient operation of the filter is a unit vector template generation technique along with PI and FLC. The switching action of the inverter is carried out with the help of conventional hysteresis current controller. There are five sections in the paper including introduction in Sect. 1. Section 2 describes the details of the proposed hybrid filter, Sect. 3 describes the details of the controller, Sect. 4 and Sect. 5 describes the simulation results and conclusions respectively.
Performance Analysis of Hybrid Filter
59
2 Design of the Proposed System Figure 1, shows the designed filter. The distribution network is connected to a highly distorted load of a three-phase diode bridge rectifier and R-L, the hybrid filter is the combination of low pass filter with a shunt active power filter. The higher harmonics component of the signal is filter out with the LC filter and lower order harmonics components is removed by the shunt filter. The shunt filter is a three-phase converter, it consists of six IGBTs and DC Link capacitor. The performance of the filter depends on the controller used for the switching pulses, the shunt active filter delivers the compensating current using the control techniques which has equal in magnitude of the harmonic’s component and phase opposite of the same.
Fig. 1. Proposed system for this work
3 Control Approach In this work, first the reference current is generated using UVTG techniques followed by the load current signal, PI controller is implemented to keep the voltage across the VSI constant, for further improvement of the performance FLC is used with the PI. Finally, hysteresis current controller is used to generate the switching pulses. The step-by-step procedure to implement the control techniques is described below. 3.1 UVTG Control Algorithms Implementation In this work, at first the unit vector template generation along with controller (PI) is implemented to obtain the reference current of the proposed inverter and after that, for further improvement of the performance a fuzzy controller with the PI is used,
60
B. Pavankumar and S. Pradhan
the proposed control technique presented in Fig. 2. Initially supply is fed to the phase locked loop through gain “K” which produces in phase and 120-degree phase difference three-unit vector, secondly voltage error produced, error is the input to the PI controller, maximum current is obtained from its output using the limiter [1], PI controller output is the reference amplitude of the input current. The in-phase reference is obtained by multiplying their respective amplitude and in-phase unit current vectors [1, 6]. Finally, the current error produced by comparing the refence current and the actual current and it is fed the hysteresis controller for generation of the pulses. Then the filter able to inject the compensating current. After that, further performance of the filter is improved using Fuzzy Logic controller along with the PI discussed in Sect. 3.2.
Fig. 2. UVTG implementation
3.2 Proposed FLC Algorithm In the year 1965, the concept of fuzzy developed by Zadeh using the concept of set theory [4]; To run the FLC algorithm in the the proposed APF in a closed loop, at first sensor sense the dc link voltage and after that it will compare with the reference voltage which is to be required. The generated error and the changed error indications taken as inputs to develop the FLC [7], then fuzzification and defuzzication work is carried out to achieve the rate of corresponding crisp [12], the rules of FC are formulated basing upon the membership function and the variables used [9], effectiveness of FLC is decided by the number and type of membership function, selection of membership function depends on the designer expertise in the field of fuzzy, once membership function selected, it will show how well the selected rules convergence effectively [7]. The performance of the developed fuzzy logic controller is evaluated with the proposed filter with the help of MATLAB/Simulink, using fuzzy set theory [4]. a computational fuzzy interference
Performance Analysis of Hybrid Filter
61
system is designed with Madani [4], in this work, control rules developed using the terms IF and THEN. IF and THEN are stands for condition and results and represent linguistic approach that perform the input variables and output variables respectively. The rule table is presented in Table 1 with membership function shown in Fig. 3. In this work, numerical to linguistic term are obtained through rule matrix of (7 × 7) with triangular membership function, the way it works is described in Table 1, from this table it can be easily asses when more compensation is required, when lesser compensation is required and when no compensation is required. In the table 1st row and second column (nl) and 1st column and 2nd row nl, the result is nl which is in the 2nd row and 2nd column, so here the compensation will be negative and the same way when error and change in error PL, compensation will be positive and if error and change in error is z, no compensation is required, the fuzzy membership diagram is shown in Fig. 3, in this way the voltage of the capacitor is regulated and after that PI controller work as described in Sect. 3.1 for the required signal and pulses. Table 1. Implementation of fuzzy logic approach E/E
nl
nm
ns
z
ps
pm
pl
nl
nl
nl
nl
nl
nm
Ns
zo
nm
nl
nl
nl
nm
ns
Zo
ps
ns
nl
nl
nm
ns
zo
Ps
pm
z
nl
nm
ns
zo
ps
Pm
pl
ps
nm
ns
zo
ps
pm
Pl
pl
pm
ns
zo
ps
pm
pl
pl
pl
pl
zo
ps
pm
pl
pl
pl
pl
3.3 Hysteresis Current Controller The characterstics of first transient response, better accuracy and good stabilty [7] makes the operation of the hysteresis controller simple. The hysteresi controller works followed by the Eq. (1) and (2). The comparator diagram is shown in the Fig. 4. When ∗ ∗ ∗ ,isb ,isc − ila , ilb , ilc < isa
HB 2
(1)
∗ ∗ ∗ ila ,ilb ,ilc > isa ,isb ,isc −
HB 2
(2)
Equation (1) and Eq. (2) stands for upper switch ON,lower switch OFF and upper switch OFF and lower switch ON respectively. Figure 4 presents the comparator diagram.
62
B. Pavankumar and S. Pradhan
Fig. 3. Fuzzy rule
Fig. 4. Hysteresis controller comparator diagram
4 Result and Analysis At first the model is designed; the system is fed with highly distorted non-linear load. the hybrid filter is connected as shown in Fig. 1. The low pass filter eliminate high harmonics component and shunt active filter will work for the process of lower order harmonics as discussed in the previous steps. The controller efficiency is verified with the simulation work, The simulation results obtained is discussed below. Two cases are taken in to consideration, first PI with UVTG and PI-FLC with UVTG. 4.1 PI with UVTG Controller Implementation The results obtained using this technique are shown below, The DC Link voltage shown in Fig. 5. Before compensation and after compensation load current is shown in Fig. 6
Performance Analysis of Hybrid Filter
63
and Fig. 7. FFT analysis after comensation it is shown in Fig. 8. The obtained THD in this method is 4.33% of the fundamental which is acceptable by IEEE519.
Fig. 5. Capacitor voltage with PI controller
Fig. 6. Load current before compensation
Fig. 7. Grid current after compensation
4.2 FLC Along with PI and UVTG Implementation In this technique the results obtained are shown below, The voltage across the DC Link capacitor is shown in Fig. 9. The DC voltage is better as fuzzy logic controller used to tune the error leads to less fluctuation. The load current before compensation and after compensation is shown in Figs. 10, 11 and 12. The Total harmonic distortion is 1.17% percentage of the fundamental which is very much encouraging. The comapritive analysis of PI and FLC with UTVG technology is shown in Table 2.
64
B. Pavankumar and S. Pradhan
Fig. 8. PI and UVTG-FFT analysis before compensation techniques applied
Fig. 9. Capacitor voltage with PI controller
Fig. 10. Load current
Fig. 11. Grid current
Performance Analysis of Hybrid Filter
65
Fig. 12. PI and UVTG -FFT analysis after compensation technique is applied Table 2. Comparision analysis of PI and FLC with UVTG Control technique used
THD before compensation
THD after compensation
PI with UVTG
61.71
4.55
PI and FLC with UVTG
61.71
1.71
5 Conclusions In this work, the effectiveness of both PI and the combination of both PI and fuzzy logic approach is verified through simulation. The performance of the proposed filter with unit vector template generation along with PI controller is analyzed and it is observed that the capacitor voltage is not so much steady but the THD after compensation is 4.55 percentage of fundamental and this value is acceptable by IEEE 519, but when fuzzy logic controller used along with PI, the fuzzy controller results less fluctuation in the DC voltage leads to more accurate of the reference current magnitude and finally proper switching pulses is fed to the voltage source inverter and the THD is 1.17 percentage of fundamental so it is concluded that the combined PI and fuzzy controller proves it effectiveness in terms of harmonics compensation and reactive power improvement when compared with simple PI controller.
References 1. Akagi, H.: New trends in active filters for power conditioning. IEEE Trans. Ind. Appl. 32(6), 1312–1322 (1996) 2. Singh, B., Haddad, K., Chandra, A.: A review of active power filters for power quality improvement. IEEE Trans. on Ind. Elect. 45(5), 960–971 (1999) 3. Rahmania, S., Hadded, A., Kanaan, H.: A comparative study of shunt hybrid and shunt active power filter for single phase application: simulation and experimental validation. Math. Comput. Simul. 71, 345–359 (2006) 4. Chandra, A., Singh, B., Haddadand, K.: An improved algorithim of shunt active filter for voltage regulatuion, harmonics elimination, power factor correction and balancing of nonlinear loads. IEEE Trans. Power Elect. 15(3), 495–507 (2000) 5. Sahu, L.D., Dubey, S.P.: ANN based hybrid active power filter for harmonics elimination with distorted mains. Int. J. Power Electron. Drive Syst. (IJPEDS) 2(3), 241–248 (2012)
66
B. Pavankumar and S. Pradhan
6. Pal, Y.: A new topology of three-phase four-wire UPQC with a simplified control algorithm. Major J. Electr. Eng. 6(1), 24–32 (2012) 7. Patel, R., Panda, A.: Real time implementation of PI and fuzzy logic controller based 3-phase 4-wire interleaved buck active power filter for mitigation of harmonics with id–iq control strategy. Int. J. Electr. Power Energy Syst. 59, 66–79 (2014) 8. Ekhlas, M., Mazin, M., Abbas, E.T.: FLC based shunt active power filter for current harmonics comp. IEEE Int. Conf Comput. Sci. Software Eng. 94–99 (2020) 9. Puhan, P.S., Sandeep, S.D.: Real time neuro-hysteresis controller ımplementation in shunt active power filter. In: Satapathy, S.C., Srujan Raju, K., Shyamala, K., Rama Krishna, D., Favorskaya, M.N. (eds.) ICETE 2019. LAIS, vol. 4, pp. 355–363. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-24318-0_43 10. Mahajan, V., Agarwal, P., Gupta, H.O.: Simulation of shunt active power filter using instantaneous power theory. IEEE Fifth Power India Conf. 2012, 1–5 (2012). https://doi.org/10.1109/ PowerI.2012.6479562 11. Puhan, P.S., Panda, G., Ray, P.K.: A comparative analysis of artificial neural network and synchronous detection controller to improve power quality in single phase system. Int. J. Power Electron. 9(4), 385 (2018) 12. Malvezzi, V., Silva, S.: A comparative analysis between the PI and fuzzy controllers for current conditioning using a shunt active power filter. IEEE power Electronics Conference, pp. 981–986. Brazil (2013) 13. Prasana, V.L., Puhan, P.S., Sahoo, S.: Active filter with 2-fuzzy intelligent controller: a solution to power quality problem. In: Udgata, S.K., Sethi, S., Srirama, S.N. (eds.) Intelligent Systems. LNNS, vol. 185, pp. 59–71. Springer, Singapore (2021). https://doi.org/10.1007/978-981-336081-5_6 14. Puhan, P.S., Ray, P.K., Panda, G.: A comparative analysis of shunt active power filter and hybrid active power filter with different control techniques applied for harmonic elimination in a single phase system. Int. J. Model., Identif. Control 24(1), 19 (2015) 15. Puhan, P.S., Ray, P.K., Pottapinjara, S.: Performance analysis of shunt active filter for harmonic compensation under various non-linear loads. Int. J. Emerg. Electr. Power Syst. 22(1), 21–29 (2021) 16. Mishra, S., Dash, S.K., Ray, P.K., Puhan, P.S.: Analysis and experimental evaluation of novel hybrid fuzzy-based sliding mode control strategy for performance enhancement of PV fed DSTATCOM. Int. Trans. Electr. Energy Syst. 31, e12815 (2021)
Transmission of Aggregated Data in LOADng-Based IoT Networks Sayeda Suaiba Anwar and Asaduzzaman(B) Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chattogram 4349, Bangladesh [email protected]
Abstract. The Internet of Things (IoT) is a recent idea that suggests connecting devices in order to exchange data and achieve a common purpose. When a significant number of packets must be sent over a lowpower and lossy network (LLN) with multiple nodes, energy is quickly depleted. Each sensor’s energy usage has a direct impact on the network’s operational lifespan. In traditional sensor networks, only cluster-heads (CHs) based data aggregation approach is commonly used to improve the network lifetime. In IoT based sensor networks, almost no research has been done on the data aggregation and energy consumption patterns of the Light-weight On-demand Ad-hoc Distance-vector Next Generation (LOADng) routing protocol. The main purpose of this paper is to propose an algorithm to aggregate data at node level to reduce data redundancy and transmit it using LOADng. The data aggregation technique focuses on each sensor node, discarding duplicate data and updating route lifetime. Performance of the proposed technique is evaluated and also compared with the traditional approach based on various parameters such as packet loss, end-to-end delay, energy consumption, jitter and round-trip time. Results show that the proposed methodology outperforms the existing one. Keywords: IoT consumption
1
· LLN · Data aggregation · LOADng · Energy
Introduction
The exciting next-generation Internet, known as the IoT, is enabled by wireless sensor networks (WSN), enhanced networking protocols, wireless radio frequency systems and a host of other technology and communication solutions. The nodes in WSN assisted IoT are resource constrained in a variety of ways, including storage, computation, control, energy, and many more and they form a LLN. Several data aggregation methodologies have been implemented in WSN to preserve energy [1]. By minimizing the volume of data to be transmitted and best using network infrastructure by effective cooperation, Multiple-Input Multiple-Output(MIMO) and data aggregation strategies are combined to minimize energy usage per bit in WSNs [2,3]. In low power IoT networks, two kinds c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 67–76, 2022. https://doi.org/10.1007/978-981-19-3089-8_7
68
S. S. Anwar and Asaduzzaman
of routing protocols, reactive and proactive, are used, both of which are based on path formation concepts. The most prominent and widely used reactive protocol for LLN is LOADng [4]. Sensor nodes generates similar data sets in several successive intervals of time which results into data redundancy [1]. Data aggregation eliminates redundant data, lowering the data traffic as well as energy consumption and enhancing node life. When data from individual sensors is sent to the Internet, there is numerous traffic load that can induce a failure of the packet. A data aggregation based LOADng routing protocol for IoT networks has been proposed in this paper. The proposed technique can be used to reduce network traffic congestion. To ensure low control message overhead, we propose using the LOADng [4] routing protocol to relay packets between sensor nodes and the internet. To remove redundant data, data aggregation is proposed within the routing protocol. The basic difference between the existing LOADng and proposed technique is that data is aggregated in node level and transmitted to other nodes which ensures lower packet loss and less energy consumption. This paper uses Smart Route Request (SmartRREQ) feature in the routing phase. The following are the paper’s main contributions: – Implementation of LOADng protocol by incorporating the proposed data aggregation technique in NS3. – SmartRREQ decreases the number of control messages sent between nodes, resulting in a more efficient network with lower overhead. – A data aggregation algorithm has been implemented within the routing protocol to reduce data redundancy. Data aggregation ensures lowering the data traffic which enables lower packet loss and delay as well as the energy consumption of sensor nodes. Hardly any work exists regarding energy consumption patterns of LOADng routing protocol [5]. This paper reduces the amount of energy consumed to route data messages by eliminating data redundancy through data aggregation, resulting in a more energy-efficient network.
2
Related Works
The Internet Engineering Task Force (IETF) introduced a series of protocols and open standards, Constrained Application Protocol (CoAP) and IPv6 over LowPower Wireless Personal Area Networks (6LoWPAN), to make applications and services more accessible to wireless and resource-constrained devices. Message Queuing Telemetry Transport (MQTT), Advanced Message Queuing Protocol (AMQP) and Data Distribution Service (DDS) are the examples of application layer protocols and specifications for IoT [6]. Low Energy Adaptive Clustering Hierarchy (LEACH) and other hierarchical protocols aim to increase the network’s scalability and reliability [7]. They use data aggregation to prevent fraudulent signals and redundant data from reaching the base station but, are vulnerable to HELLO flooding and selective forwarding attacks [8]. A number of
Transmission of Aggregated Data in LOADng-Based IoT Networks
69
novel security problems remain to be resolved when these protocols are deployed in the IoT environment. The great majority of current data aggregation systems which are hierarchical, whether tree-based or cluster-based, use event-driven data models. Data aggregation strategies that rely mainly on the collection of CHs and data transfer to the sink can be used in cluster-based networks. Other methods advocate signal processing and physical layer techniques [1]. On the contrary, the development of an energy-efficient data aggregation tree without decreasing duplication is one of the key advantages of tree-based networks. IETF has produced a number of RFCs (Request for Comments) in recent years with the goal of overcoming interoperability concerns that have developed in growing IoT scenarios [9]. A proactive routing protocol, named Routing Protocol for LLN (RPL), communicates between nodes via a directed acyclic graph. Recent research, on the other hand, has demonstrated that RPL performs poorly in certain types of network traffic [10]. LOADng has been proposed as a replacement to RPL. This study used the LOADng framework, which enables it to search Internetconnected nodes in a varied and on-demand manner, while taking into account the hindrances of previous research.
3
LOADng Protocol
LOADng proposed in [4] is a simpler version of Ad-hoc On-demand Distance Vector (AODV) protocol because an RREQ can only be responded by the destinations. As precursor list is not maintained by LOADng router, a RERR can only be transmitted to the source. LOADng is also a more advanced variant of AODV, with optimized RREQ flooding and support for various address lengths [5]. Being a reactive routing protocol, LOADng discovers routes by transmitting path request and reply messages. In all p2mp and p2p traffic cases, LOADng outperformed RPL but for mp2p, it conducts a route discovery for each path to the root producing the highest overhead [11]. The control messages used in LOADng are Route Request, Route Reply, RREP Acknowledgement and Route Error. 3.1
SmartRREQ
The SmartRREQ feature was proposed for LOADng in order to decrease the number of control messages exchanged during route finding [12]. After finishing all of the initial processing and ensuring that the message is correct for forwarding, a node can do the SmartRREQ appropriate processing. As a result, the node looks at its Routing Entries to see if it has a route to the message’s destination. If a route is found, the SmartRREQ message is delivered in unicast to the destination [13].
70
S. S. Anwar and Asaduzzaman
4
Proposed Methodology
The key goal of the proposed scheme is to eliminate network traffic congestion by using a data aggregation technique to reduce redundant data and transmitting packets using the LOADng routing protocol. The entire procedure is split into two phases, namely, 1. Data Aggregation Phase and 2. Routing Phase. 4.1
Data Aggregation Phase
We already mentioned in the previous section that other data aggregation approaches that use clustering for aggregation are only concerned with the CH node. However, in order to minimize traffic, we discarded redundant data in each sensor node. Algorithm 1 demonstrates the data aggregation process. When a sensor node generates a packet pi , it can proceed to its destination by passing through other nodes. Each node keeps a cache file in which it stores the packet ids of previous packets that have passed through it. When a node receives a new packet, it compares the packet-id of the current packet to the packet-id of all previous packets that have gone through the current node. If it matches with pj , pi is discarded, and the frequency of the pj , i.e. the number of times an identical packet has arrived at this node is incremented. Otherwise, pi is forwarded to the next hop.
Algorithm 1. Data Aggregation 1: for pj P do 2: if Similar(pi , pj ) = 1 then 3: F req(pj ) ← F req(pj ) + 1 4: discard pi and update route life-time 5: else 6: send pi to next hop 7: F req(pi ) ← 1 8: end if 9: end for
4.2
Routing Phase
This phase is proposed to transmit the aggregated data to the internet connected nodes. The first stage is determining whether or not a packet is a data packet. If the packet is a data packet with a known destination address, it should be forwarded to the destination in unicast. The RREQ for the data packet must be broadcast when the destination address is unknown but the current node is
Transmission of Aggregated Data in LOADng-Based IoT Networks
71
the source. For any other case, an RERR message is sent to the node’s source address. If a packet originates from a node on the blacklist, it must be discarded. For a RREQ, the routing table of current node is updated, and the packet’s validity for processing is verified. A RREP to the source address is produced when the present node is the destination node. It assesses if the message is suitable to forward if it is not the destination node. Based on the SmartRREQ principle, the node shall do the required processing. It looks through the routing table to see if the destination has a routing entry. If the condition is met, the node forwards the message in unicast to the next address located. The message can be processed again by the next address that receives it before it finds its destination. The message should be broadcast if the node is unable to discover a path entry to the destination. As a result, the SmartRREQ enhancement principle minimizes the number of transmitted broadcasts, thus reducing the control message overhead needed to find a new path. For a RREP, the routing table needs to be updated also. The packet is dispatched from the packet queue if the present node is the destination. The packet is unicast to its destination provided the destination address for it is in the routing table of the present node. A RERR signal is unicast to packet’s source if the packet is discarded. The routing table and PendingAck are both updated when the packet is a RREP ACK. For any other case, it must be determined whether the packet contains an error message. The packet is discarded provided its destination is the current node; otherwise, RERR is unicast to the source. The data aggregation algorithm is implemented within the routing protocol. For duplicate file detection two files named loadng-dpd.cc and loadng-dpd.h have been implemented. In these files, the task of detecting duplicate files along with updating route lifetime has been done. Another two files named loadng-cache.cc and loadng-cache.h are maintained to keep a record of the packets which pass through a node. When a packet passes through a node, it matches the packet-id with the previous packet-ids which are recorded in the cache. If the packet-ids match, then the packet is discarded, else, the packet is forwarded next.
5
Implementation and Result
To evaluate the performance and compare our proposed solution with the existing approach, a virtual network is developed. The network simulation has been done in NS-3(Version 3.29) in Linux operating system. C++ has been used as the programming language. 5.1
Parameters for Simulation
Default values for all the parameters in Table 1 are chosen for optimal solution [5]. The parameters in Table 2 are the factors which effect the evaluation of performance metrices.
72
S. S. Anwar and Asaduzzaman
The simulations were carried out with 10 to 90 nodes, which were placed randomly with a constant node density. So extension of network’s coverage area takes place when the number of nodes increases. Each packet is 64 bytes in size and has a data rate of 2048 bits per second. Routing parameters for LOADng are enlisted in Table 1. Network diameter indicates the maximum hop numbers for traversing the network. Time required for a RREQ to traverse the network is taken as Network Traversal Time. Average time required to traverse one hop distance by a packet is the Node Traversal Time. RREP- Ack required is set as TRUE which indicates the necessity of a RREP-ACK in response to a RREP. Table 1. Routing parameters Parameters for LOADng
Value used
Network Diameter
35
Network Traversal Time
40 ms
Node Traversal Time
2.8 s
RREQ Retries
2
Time-out Buffer
2
Blacklist Time-out
5.6 s
Next Hop Wait
50 ms
RREP-Ack Required
TRUE
Use Bidirectional Link Only TRUE Maximum Hop Count
255
Maximum Queue Length
64
Parameters stated in Tables 1 and 2 indicate the size and type of the network. By adjusting the total number of nodes in the network, the influence of traffic load on network resilience and energy efficiency can be monitored. Table 2. Parameters for simulation Parameter
NS-3 simulation platform
Number of sensor nodes
10 to 90 nodes
Position of sensor nodes
Random with constant node density
Application Model
On-off application
Propagation Delay Model ConstantSpeedPropagationModel Mobility Model
RandomWaypointMobilityModel
Propagation Loss Model
FriisPropagationLossModel
Simulation Time
300 s
Energy Source
BasicEnergySourceHelper
Device Energy Model
WifiRadioEnergyHelper
Transmission of Aggregated Data in LOADng-Based IoT Networks
5.2
73
Experimental Results
In this paper, performance analysis has been done for a number of nodes varying from 10 to 90. Figures 1 and 2 show the packet loss and end-to-end delay of the original LOADng protocol [4] and the proposed protocol. Both packet loss and end-to-end delay increases with the increasing number of nodes for both the protocols. For the proposed approach, packet loss is less than the existing approach because data aggregation discards redundant data, so there is less chance of packet loss and end-to-end delay is also decreased. Moreover, the proposed SmartRREQ approach contributes to reduce the delay and jitter. Consequently, the proposed protocol outperforms the existing one, in terms of packet loss and end-to-end delay, over the wide range of network size. The battery consumption of a sensor node is significantly correlated with its longevity because of limited energy source available, and recharging is difficult and expensive in most circumstances. In this paper, BasicEnergySourceHelper and WifiRadioEnergyHelper are used in ns-3 as Energy Source Model and Device Energy Model. These models have been used to analyze the energy usage of a node which is shown in Fig. 3.
Fig. 1. Packet loss vs. No. of nodes
The proposed scheme uses less resources than the existing one, ensuring a longer network lifespan. Importantly, this gain increases as the number of nodes increases which suggests that the proposed scheme is very efficient for larger networks. Jitter and Round-Trip time of the original and the proposed scheme are observed from Figs. 4 and 5 respectively. As end-to-end delay is less for transmitting aggregated data in the proposed method, the round-trip time is also showing a similar performance. A huge number of nodes remain involved in an IoT network. From previous research works, it is seen that performance of original LOADng [4] reduces with the increase of network size [14]. Observation says that proposed scheme
74
S. S. Anwar and Asaduzzaman
Fig. 2. End-to-end delay vs. No. of nodes
Fig. 3. Energy consumption vs. No. of nodes
Fig. 4. Jitter vs. No. of nodes
Transmission of Aggregated Data in LOADng-Based IoT Networks
75
Fig. 5. Round trip time vs. No. of nodes
is showing a better performance after 50 nodes which ensures that this method effectively exploits benefits of data aggregation when data transmission will take place among a large number of nodes.
6
Conclusion and Future Work
A data aggregation algorithm for LOADng is discussed in this paper. Advantage of the proposed approach includes elimination of data redundancy, resulting in lower traffic volume and energy usage, as well as a longer network lifespan. Different clustering based data aggregation methods concentrate on the cluster head, while the suggested method focuses on each sensor node which is more suitable for IoT applications. No important data is lost in the proposed scheme as the frequency of appending similar data is recorded. The proposed solution achieves better outcomes, in terms of packet loss delay and energy consumption, than the existing one. In this paper, we consider hop count as the routing metric. Performance analysis with other routing metric may be a good topic for future research. Acknowledgement. This work was supported by the project No. CUET/DRE201920/CSE015 of Directorate of Research and Extension (DRE), Chittagong University of Engineering and Technology.
References 1. Harb, H., Makhoul, A., Tawil, R., Jaber, A.: Energy-efficient data aggregation and transfer in periodic sensor networks. IET Wirel. Sens. Syst. 4(4), 149–158 (2014) 2. Du, W., Zhou, W.: An overview of energy-saving schemes with cooperative MIMO in wireless sensor networks. In: 2019 2nd World Conference on Mechanical Engineering and Intelligent Manufacturing (WCMEIM), pp. 382–386. IEEE (2019)
76
S. S. Anwar and Asaduzzaman
3. Asaduzzaman, Kong, H.Y.: Coded diversity for cooperative MISO based wireless sensor networks. IEEE Commun. Lett. 13(7), 516–518 (2009) 4. Clausen, T., Yi, J., Herberg, U.: Lightweight on-demand ad hoc distance-vector routing-next generation (LOADng). Comput. Netw. Int. J. Comput. Telecommun. Netw. 126(C), 125–140 (2017) 5. Sasidharan, D., Jacob, L.: A framework for the IPv6 based implementation of a reactive routing protocol in NS-3: case study using LOADng. Simul. Model. Pract. Theory 82, 32–54 (2018) 6. Gupta, B.B., Quamara, M.: An overview of internet of things (IoT): architectural aspects, challenges, and protocols. Concurr. Comput. Pract. Exp. 32(21), e4946 (2020) 7. Asaduzzaman, Kong, H.Y.: Energy efficient cooperative leach protocol for wireless sensor networks. J. Commun. Netw. 12(4), 358–365 (2010) 8. Pushpalatha, S., Shivaprakasha, K.S.: Energy-efficient communication using data aggregation and data compression techniques in wireless sensor networks: a survey. In: Kalya, S., Kulkarni, M., Shivaprakasha, K.S. (eds.) Advances in Communication, Signal Processing, VLSI, and Embedded Systems. LNEE, vol. 614, pp. 161– 179. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0626-0 14 9. Palattella, M.R., et al.: Standardized protocol stack for the internet of (important) things. IEEE Commun. Surv. Tutor. 15(3), 1389–1406 (2012) 10. Lim, C.: A survey on congestion control for RPL-based wireless sensor networks. Sensors 19(11), 2567 (2019) 11. Yi, J., Clausen, T., Igarashi, Y.: Evaluation of routing protocol for low power and lossy networks: LOADng and RPL. In: 2013 IEEE Conference on Wireless Sensor (ICWISE), pp. 19–24. IEEE (2013) 12. Yi, J., Clausen, T., Bas, A.: Smart route request for on-demand route discovery in constrained environments. In: 2012 IEEE International Conference on Wireless Information Technology and Systems (ICWITS), pp. 1–4. IEEE (2012) 13. Sobral, J.V., Rodrigues, J.J., Rabˆelo, R.A., Saleem, K., Furtado, V.: LOADngIoT: an enhanced routing protocol for internet of things applications over low power networks. Sensors 19(1), 150 (2019) 14. Sobral, J.V.V., Rodrigues, J.J.P.C., Saleem, K., Al-Muhtadi, J.: Performance evaluation of LOADng routing protocol in IoT P2P and MP2P applications. In: 2016 International Multidisciplinary Conference on Computer and Energy Science (SpliTech), pp. 1–6 (2016). https://doi.org/10.1109/SpliTech.2016.7555943
Deep Learning Based Facial Mask Detection Using Mobilenetv2 Arijit Goswami1 , Biswarup Bhattacharjee2 , Rahul Debnath2 , Ankita Sikder2 , and Sudipta Basu Pal2(B) 1 Computer Science Engineering Department, University of Engineering and Management,
Kolkata, Kolkata, India 2 Computer Science Information Technology Engineering Department, University of
Engineering and Management, Kolkata, Kolkata, India [email protected]
Abstract. The Covid-19 pandemic has had a profound effect on our daily lives. One of the most effective ways to protect ourselves from this virus is to wear face masks. This research paper introduces face mask detection that authorities can use to reduce and prevent COVID-19. The face mask recognition process in this research paper is done with a deep learning algorithm and image processing done using MobileNetV2. Steps to build the model are data collection, pre-processing, data classification, model training and model testing. The authors came up with this approach due to the recent Covid-19 situations for following specific guidelines and the uprising trend of Artificial Intelligence and Machine Learning and its real-world practices. This system has been made to detect more than one person whether they are wearing masks or not. This system also gives us the Covid casesrelated worldwide updates as per our chosen country and type of cases like total cases, total deaths etc. Such systems are already available, but the efficiency of the available mask detection systems was not achieved thoroughly. This newly developed system proposes to take a step further, which recognizes more than one person at a time and increases the accuracy level to a much greater extent. Keywords: Mask detection · Deep learning · Computer vision · Convolutional Neural Network · Image processing · Web scraping · Model visualization · Accuracy prediction
1 Introduction Because of the Covid-19 pandemic, which needs people to wear masks, maintain social distance, and wash their hands with hand sanitizer, the discovery of a Face Mask is the need for an hour [1]. Although some social distance as well sanitization issues have been discussed in the past, face mask detection with high accuracy level is yet to be discovered [2]. Wearing a mask during the pandemic is an important precautionary measure, and maintaining a considerable social distance is quite difficult. This research is particularly based upon the present covid situation, the importance of masks, and also the uprising trend of using Artificial Intelligence and Machine Learning in many real© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 77–89, 2022. https://doi.org/10.1007/978-981-19-3089-8_8
78
A. Goswami et al.
world practices [13]. The mask is necessary for everyone who is at risk of serious illness due to COVID-19 diseases, especially those at high risk [3]. As a result, reducing the chances of transmission of this deadly virus from an infected person to a healthy person will eventually reduce the spread of this virus and also the spread of infection [4]. This produces much better output and eventually a high accuracy rate as well [14]. Training models through Convolutional Neural Networks and by using convolution operators produce a generic output [15]. Through this research, a layer of efficiency is added. MobileNetV2 has a faster processing rate compared to CNN [16]. It uses a few parameters that make the whole process easier completely [17]. With the exception of the novel MobileNetV2, an image data generator is used to augment the dataset [18].
2 Background of the Work Jinsu Lee et al. [5] proposed a model called “Integrated Method for CNN Acquisition Models”. Their research work focuses on finding items of a certain class in digital photography and videos. They had used [6] in-depth reading, especially, CNN. In the first phase, they made the acquisition of two phases in which regional proposals are produced. In the second stage, there is a single stage detector that helps to detect and separate an object without generating regional suggestions [7]. A variety of CNN modeling features were incorporated along with advanced methods for combining objects, and their novel approaches to modeling and voting boxes [8, 19]. Sebastian Handrich et al. [9] have proposed a method called “Face Attribute Detection using MobileNetV2 and NasNetMobile”. In their work they had come up with two simple and effective ways of measuring facial features [10] in unconventional or photographic images [20], using precise and fast facial alignment techniques. MobileNetV2 and NasNetMobile were used to measure facial features. Two lightweight CNN structures and are featured and both perform in the same way, depending on the accuracy and speed [11]. They also compared the model, in terms of processing time and accuracy, and showed that this method is slightly faster than the modern model [12]. This model was easy to use and can also be used on mobile devices [21]. But, the accuracy of all these systems was not obtained to a great extent. The goal of our research mainly focused on increasing the accuracy of the system, so that the system can be widely used during the pandemic situation. Increasing the accuracy of such a system will help several organizations to control the current situation.
3 Implementation The main goal of the research is to implement ‘MASK DETECTION SYSTEM’ and ‘COVID UPDATES’ using deep learning concepts. For making it user-friendly, the target is to build a ‘GUI’ with buttons integrated into it for using the system. For the live detection, faces have to be detected and also the accuracy of wearing or not wearing masks has to be shown [28]. Based on that, a decision has to be produced as the output. Image data has to be worked on, for the training model. For the Covid Updates section, a GUI should be made accessing the facilities. Pop up notification should be generated after user chooses the covid case type such as total case, total death, new cases, new
Deep Learning Based Facial Mask Detection Using Mobilenetv2
79
deaths, total recovered, all cases. An option has to be there to download databases of Covid cases. The implementation is made in such a way that the data which is fetched from worldometer website is correct and updated as per covid situation. 3.1 Dataset Analysis An external dataset has been downloaded for people’s images with and without masks from Kaggle [29]. The dataset consists of 1915 images of people wearing masks and 1918 images of people without wearing masks [29]. In this dataset, 80% images are used as training data and 20% images are used as testing data. 3.2 Model Training MobileNetV2 is a convolutional neural network architecture that seeks to work well on mobile devices. It is based on a distorted residual structure and it is an effective feature extractor used mainly for object detection and segmentation. The Convolutional Neural Network layer has been replaced with the MobileNetV2 and at last, Adam optimizer is used to fully optimize the model. The authors have created a training.py file for model training. Here, learning rate = 1e−4 , epoch = 20 and batch size = 32 are used for implementation of the model. The model uses two categories of data with masks and without masks. The size of the images has been set as (224,224) [28]. A label binarizer has been used for labeling and then the data has been taken as NumPy array. Data has been split in trainX, trainY, testX, testY. Training image generator has been constructed for data augmentation. The MobileNetV2 network has been used for working with the base model and head model. The head of the model will be placed on top of the base model. AveragePooling2D, Flatten, Dense have been used. [22] The activation function ReLu has been used for solving the vanishing gradient problem. Dropout 0.5 is used for avoiding overfitting. After all these implementations, the model is compiled and is made to predict based on data. Adam Optimizer is used for model compilation [23]. After that, a hysteresis graph of epoch (X-axis) vs loss-accuracy (Y-axis) using the python matplotlib library has been plotted. Adam is a replacement optimization algorithm for stochastic gradient descent and it is used to train deep learning models [24]. It can handle the sparse gradients on complex problems. In this paper, the authors have used Adam optimizer for model compilation. Adam Optimizer is used mainly as it involves a combination of two gradient decent methodologies, i.e., Momentum and Root Mean Square Propagation (RMSP). The adaptive Moment Estimation algorithm is used for efficient gradient reduction. The best use of using this algorithm is especially for large problems with huge data and several parameters, it works in a very efficient manner and it also requires less memory. Moreover, it is a combination of the ‘RMSP’ algorithm and ‘gradient reduction by momentum’. This is mainly used to make the algorithm much more efficient by considering the ‘maximum weight’ of the credits. Based on this Eq. 1 [30] is shown below: ωt+1 = ωt − αmt
(1)
δL where, mt = βmt−1 + (1−− β) δW t
80
A. Goswami et al.
Root mean square prop or RMSprop is an adaptive learning algorithm that seeks to improve AdaGrad. Instead of taking a set number of square gradients like in AdaGrad, it takes an ‘exponential move average’. Based on this Eq. 2 [30] is shown below: δL αt wt+1 = wt− ∗ (2) 1 δwt (vt + ε) 2 δL 2 where, vt = βvt−1 + (1 − β) ∗ δwt Mathematical Aspect of Adam Optimizer The Mathematical aspect of Adam optimizer is shown in Eq. 3. mt = β1 mt−1 + (1 − β1 )
δL 2 δL vt = β2 vt−1 + (1 − β2) δwt δwt
(3)
Figure 1 describes the mean accuracy of the face detection model analysis according to specific epochs.
Fig. 1. Mean accuracy of the model
Deep Learning Based Facial Mask Detection Using Mobilenetv2
81
3.3 Detection Authors have created detection.py file for detection after prediction. For live detection, OpenCV cv2 in python has been used. Here, a frame of detection has been set while live camera streaming. An infinite loop has been used for keeping the camera on. Authors have made a bounding rect where users have to place their faces. Here confidence is measured. If it comes greater than 50% then the user is wearing a mask. The accuracy percentage is also calculated [25]. Generally, the images are in BGR format of live detection [29]. Authors made them in RGB format for further implementation. Then after recognition, they are converted in greyscale images and also resized in (224,224). Authors have kept the sizes of the dataset’s images and recognized images the same for perfectly comparing them. For recognizing faces a model is also created [27]. The number of people is also calculated. This model only makes a prediction, if the number of people is greater than 0 or any people is recognized. For live detection video stream has been used. Using put Text function the output has been printed, the number of people on the screen [28]. After all these, authors have made a checking if q is pressed then the output screen will be closed using destroy All Windows function and also stopping the live streaming. After the detection is completed, it will also be labeled that whether the person is wearing a mask or not along with the predictive percentage. 3.4 Covid Updates For Covid updates, CovidUpdates.py has been created. Here authors have used pandas, plyer, beautiful soup, requests python modules. Notifyme function has been created. A list of all cases in header is created. Authors made a section that requests data using html parser and fetches data using web scraping [26]. The data has been fetched from the world meter website. Pop Notification has been created using the plyer module. Authors have used the concept of web scraping here to scrape the data from the world meter website. It will fetch live data for the number of positive cases and recovered cases for the COVID-19 pandemic [28]. Authors have used a python script using the requests and Beautiful Soup modules to scrape the update of the COVID-19 case from the World meter website. 3.5 Covid Updates GUI For the covid updates GUI, CoronaGlobalUpdatesGUI.py has been created. Authors have used pandas Data Frame for downloading the data in JSON (JavaScript Object Notation) and CSV (Comma Separated Values) [28]. Here they have designed the GUI with fonts, font colors, and background images. Authors have used Graphics Interchange Format which seems to be a moving coronavirus and buttons, drop-down menu.
82
A. Goswami et al.
3.6 Main GUI For the main canvas window GUI, authors have created mask_detection_gui.py. For playing sound the lambda function has been used. Here authors have created three buttons. The authors have designed GUI by changing the background image, font, font colors, etc. Here using canvas. create window they have opened the functions of live detection and covid updates using buttons.
4 Experimental Setup Figure 2 represents the layers of neural network or architecture of image processing in mask detection system. The head model is placed in the top of the base model. In base model the input size is 224,224,3 and Mobilnetv2 is applied. In the head model ReLu and softmax activation functions are used.
Fig. 2. Architecture of the mask detection system
Figure 3 describes the working flow of the Mask Detection System.
Deep Learning Based Facial Mask Detection Using Mobilenetv2
83
Fig. 3. Working flow of the mask detection system
Figure 4 is the main Graphical User Interface of the system for accessing the functionalities.
Fig. 4. Main GUI of the mask detection system
5 Pseudocode The pseudocode for the entire facial mask detection system is shown below.
84
A. Goswami et al.
5.1 Training The following are the steps used for training the model. • • • • •
Initialize learning rate, epoch, and batch size Loading dataset (with mask and without mask) and processing image Splitting dataset into training and testing Preparing head model and base model, Saving and compiling the model Plotting graph for Accuracy vs Epoch
5.2 Detection The following are the steps used for detecting faces. • • • •
Starting video stream and processing the face images Loading dataset (both datasets for face detection and mask detection) Calculating accuracy percentage and number of faces detected Showing results in the Detection window
5.3 Mask Detection GUI The following are the steps used for detecting Mask using GUI. • Creating and Designing GUI using Tkinter • Button Integration in GUI • Calling the lambda functions according to their specified buttons along with passing arguments 5.4 Corona Global Update GUI The following are the steps used for collecting COVID-19 global data. • • • • •
Initializing the types of covid cases Creating labels, drop-down menu, and buttons in GUI Embedding GIF in GUI Button Integration Download CSV, JSON, and designing notification section
5.5 Covid Updates The following are the steps used for providing COVID-19 updates. • Fetching data from world meter website using requests and beautiful soup • Notification generation using plyer module according to user-chosen country • Creating data frames and downloading CSV and JSON files for covid cases
Deep Learning Based Facial Mask Detection Using Mobilenetv2
85
6 Result Analysis Figure 5 below represents the visualization of loss accuracy vs epoch. The red line shows the training loss, the blue line shows the value loss, the violet line shows the training accuracy, and the grey line shows the value accuracy.
Fig. 5. Accuracy vs Epoch Graph corresponding to the dataset
The authors have replaced the Convolution Neural Network layer with the MobileNetV2 and at last, we have used Adam optimizer to fully optimize our model. This is why the model accuracy score has increased a lot and the efficiency of the system has also increased. The duration of model training time has decreased a lot due to the use of a Convolutional Neural Network based on MobileNetV2. Table 1 is used to calculate the average accuracy of our deep learning model. From Table 1, it is shown the accuracy is increasing starting from the second epoch, and loss is decreasing after it. The accuracy line becomes stable gradually, and it indicates that no more iteration is required for increasing the accuracy of the model. The system is made such that it can detect multiple persons at a given time. It is implemented in python by using a list that takes the input of persons’ faces at a given time. After that, it is iterated through a loop. Then, each of the faces is passed through the training model. Thus, it predicts and detects whether the persons are wearing masks or not. It gives an accuracy score of 99.90%. The entire research work revolves around the resolution of the camera and the resolution of the display because the model optimization is fully maximized. The more the resolution, the more is the accuracy. • YoloV3 – The models are trained using YoloV3 and it is seen that the training time taken is less than the training time of MobileNetV2. Here, the batch size is 64. The accuracy score is coming as 87.16%
86
A. Goswami et al. Table 1. Accuracy analysis table
Epoch no
Loss
Val_loss
Accuracy
Val_accuracy
1/20
0.3500
0.1282
0.8839
0.9857
2/20
0.1360
0.0700
0.9641
0.9909
3/20
0.0983
0.0511
0.9736
0.9909
4/20
0.0791
0.0443
0.9763
0.9896
5/20
0.0638
0.0389
0.9806
0.9896
6/20
0.0620
0.0341
0.9802
0.9922
7/20
0.0495
0.0322
0.9852
0.9935
8/20
0.0430
0.0339
0.9858
0.9909
9/20
0.0434
0.0307
0.9875
0.9909
10/20
0.0413
0.0294
0.9888
0.9935
11/20
0.0349
0.0290
0.9898
0.9935
12/20
0.0342
0.0306
0.9904
0.9896
13/20
0.0360
0.0271
0.9904
0.9935
14/20
0.0301
0.0280
0.9904
0.9935
15/20
0.0284
0.0258
0.9927
0.9935
16/20
0.0315
0.0280
0.9901
0.9935
17/20
0.0240
0.0239
0.9924
0.9935
18/20
0.0318
0.0262
0.9895
0.9935
19/20
0.0310
0.0236
0.9901
0.9935
20/20
0.0279
0.0247
0.9918
0.9948
• YoloV4 – Improving the Yolov3 algorithm by improving mean average precision(mAP) by 10% and the number of frames per second by 12% the Yolo4 algorithm can be approached. • Faster R-CNN – After training the model, it is noticed that the accuracy of Faster R-CNN is marginally better. The accuracy score is 87.69% • Single Shot Detector – The accuracy score which is obtained from Single Shot Detector is less than MobileNetV2. Here, the epoch is 50, and the batch size is 32. The model is trained and tested using 92.86%
7 Future Scope This ‘Face Mask Detection’ system can be deployed in any public place or densely populated areas for checking for Covid-19 safety measures before entry. It will increase awareness of the global pandemic situation, and everyone will know the importance of wearing masks. This system will make people follow the guidelines related to Covid-19 and it will also make the situation better by decreasing the number of affected people.
Deep Learning Based Facial Mask Detection Using Mobilenetv2
87
The global notification idea can be added in other systems and shown in public places through television or computer to keep people updated and make them alert for their health.
8 Conclusion Artificial Intelligence and Machine Learning are innovative technology which is helpful in the present Covid-19 situation [7]. Face mask detection is a computer vision-driven image analysis solution that uses visible streams from the camera to detect and alert people wearing face masks [29]. This system can be integrated with embedded applications to apply to schools, airports, train stations, offices, and public places to ensure public safety guidelines. This research work also contains a section to get global notification related to Covid-19 affected people as per our chosen country. Not only for Covid-19 cases but also other regular cases, this program can be deployed [29]. Finally, this system is user-friendly, safe, and valuable in real-life practice in every type of sector, including offices, health care centers, etc. Applications of Face Mask Detection include the following: • In the ‘Live Detection’ part people can place their face before the webcam and then it will detect whether they are wearing a mask or not. The number of people will be shown. More than one person can be detected. • This system can be used in offices, schools, colleges, railway stations, airports, hospitals, shopping malls, and any other places to check that people are wearing masks or not.
Declaration of Competing Interests. The authors declare that they have no known competing monetary interests or personal relationships that could have appeared to influence the work reported in this paper.
References 1. W.H.O.: Coronavirus disease 2019 (COVID-19): situation report, 205 (2020). Available from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8083870. Accessed 23 Dec 2021 [Online] 2. Centres for Disease Control and Prevention.: Coronavirus disease 2019 (COVID-19) – symptoms (2020) [Online]. Available from https://www.cdc.gov/coronavirus/2019-ncov/index. html. Accessed 15 Dec 2021 3. Cdc.gov.: Coronavirus — human coronavirus types — CDC (2020) [Online]. Available from: https://www.cdc.gov/coronavirus/types.html. Accessed 13 Jan 2022 4. W.H.O.: Advice on the use of masks in the context of COVID-19: interim guidance (2020). Available from: https://apps.who.int/iris/handle/10665/331693. Accessed 18 Dec 2021 5. K. Team.: Kera’s documentation: about Keras. Keras.io (2020) [Online]. Available from: https://keras.io/about. Accessed 10 Jan 2022
88
A. Goswami et al.
6. Meena, D., Sharan, R.: An approach to face detection and recognition. In: International Conference on Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–6, Jaipur (2016). https://doi.org/10.1109/ICRAIE.2016.7939462. Available from: https://ieeexplore.ieee.org/ document/7939462 7. Ge, S., Li, J., Ye, Q., Luo, Z.: Detecting masked faces in the wild with LLE-CNNs. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 426–434, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.53. Available from: https://ieeexp lore.ieee.org/document/8099536 8. Wang, Z., et al.: Masked face recognition dataset and application. arXiv preprint arXiv:2003. 09093 (2020). Available from: https://arxiv.org/abs/2003.09093 9. Kumar, A., Kaur, A., Kumar, M.: Face detection techniques: a review. Artif. Intell. Rev. 52(2), 927–948 (2018). https://doi.org/10.1007/s10462-018-9650-2 10. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). Available from: https://ieeexplore.ieee.org/document/7410526 11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 2014. Available from: https://ieeexplore.ieee. org/document/6909475 12. Woo, S., Park, J., Lee, J.-Y., Kweon, I. S.: Cbam: Convolutional block attention module (2018). Available from: https://arxiv.org/abs/1807.06521 13. Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008). Available from: https://ieeexplore.ieee.org/document/458 7597 14. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). Available from: https://ieeexplore.ieee.org/document/778 0460 15. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015). Available from: https://papers.nips.cc/paper/2015/hash/14bfa6bb14875e45bba028a2 1ed38046-Abstract.html 16. Lee, D.-H., Chen, K.-L., Liou, K.-H. , Liu, C.-L., Liu, J.-L.: Deep learning and control algorithms of direct perception for autonomous driving. arXiv preprint arXiv:1910.12031 (2019) 17. Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020. Available from: https://arxiv.org/abs/1911.11929 18. Neubeck, A., Gool, L.: Efficient non-maximum suppression. In: Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), vol. 3, pp. 850–855, Hong Kong, China, 20–24 Aug 2006. Available from: https://ieeexplore.ieee.org/document/1699659 19. Uijlings, J.R.R., Sande, K.E.A.v.d., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013). Available from: https://doi.org/ 10.1007/s11263-013-0620-5 20. Keys, R.: Cubic convolution interpolation for digital image processing. In: IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, pp. 1153–1160. IEEE, Piscataway, NJ, USA (1981). Available from: https://ieeexplore.ieee.org/document/1163711 21. Loey, M.; Manogaran, G.; Taha, M.; Khalifa, N.E.: Fighting against COVID-19: a novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection. Sustain. Cities Soc. 65, 102600 (2020). Available from: https://pubmed.ncbi.nlm.nih.gov/33200063/
Deep Learning Based Facial Mask Detection Using Mobilenetv2
89
22. Loey, M., Manogaran, G., Taha, M.H.N., Khalifa, N.E.M.: A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the covid-19 pandemic. Measurement 167, 108288 (2020). Available from: https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC7386450/ 23. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Neural Inf. Process. Syst. 25 (2012). Available from: https://doi.org/10. 5555/2999134.2999257 24. Giger, M.L., Suzuki, K.: Computer-aided diagnosis. In: Biomedical Information Technology, pp. 359–374. Academic Press, Cambridge, MA, USA (2008). Available from: https://www. ncbi.nlm.nih.gov/pmc/articles/PMC3810349/ 25. Buciu, I.: Color quotient-based mask detection. In: Proceedings of the 2020 International Symposium on Electronics and Telecommunications (ISETC), pp. 1–4, Timisoara, Romania, 5–6 Nov 2020. Available from: https://www.mdpi.com/1424-8220/21/9/3263/htm 26. Zhang, H., Li, D., Ji, Y., Zhou, H., Wu, W., Liu, K.: Toward new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans. Ind. Inform. 16, 7722–7731 (2020). Available from: https://ieeexplore.ieee.org/document/8908822 27. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944, Honolulu, HI, USA, 21–26 July 2017. Available from: https://arxiv.org/abs/1612.03144 28. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE (2005). Available from: https://ieeexplore.ieee.org/document/1467360 29. Face Mask Detection Dataset. Available from: https://fmd-dataset.vercel.app/. Accessed 12 Jan 2022 30. Intution of Adam Optimizer. Available from: https://www.geeksforgeeks.org/intuition-ofadam-optimizer/. Accessed 10 Jan 2022
A Novel Approach to Detect Power Theft in a Distribution System Using Machine Learning and Artificial Intelligence Abhinandan De , Somesh Lahiri Chakravarty(B) , Sayan Kar , Abhijnan Maiti , and Sanchari Chatterjee Department of Electrical Engineering, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, India [email protected]
Abstract. This paper presents a novel approach to identify power theft by low voltage domestic consumers in a typical power distribution system. A data-driven approach has been used whereby electricity consumption of household consumers for a period of previous two years were taken into consideration and analyzed for detection of power theft. Six different Machine Learning (ML) approaches, such as Decision Tree, Random Forest, K Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression and Artificial Neural Network (ANN) were employed on the available dataset and the trained ML algorithms can then recognize any significant deviation in the consumption pattern of any individual consumer in the current year to interpret whether any power theft has been committed or not. Comparison of performance of these ML algorithms were done in terms of accuracy and misclassification rates. Python platform was used to simulate and train the different ML algorithms on the available dataset. In the present case study, Logistic Regression delivered the best theft-detection accuracy of 100%, followed by Decision Tree, SVM and ANN each offering 96% recognition accuracy on the test data. Keywords: Decision Tree · Random Forest · KNN · SVM · Logistic Regression · ANN · Power-theft · Pattern recognition
1 Introduction Power theft is a serious issue which leads to significant loss of revenue for the power distribution company, and often results in network overloading and other technical problems. As per the World Bank Report on “Reforming the Power Sector”, 2010 [1], the world loses about US$89.3 billion annually to electricity theft. The highest losses are in India ($16.2 billion), followed by Brazil ($10.5 billion) and Russia ($5.1 billion) [1]. In India, total transmission and distribution losses are estimated to be around 23% [2]. This is a large value in itself; and power theft further adds to this burden. Power theft is usually committed by illegally hooking or tapping the distribution lines and feeders. The present paper attempts to correctly identify specific consumers and the specific nodes © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 90–99, 2022. https://doi.org/10.1007/978-981-19-3089-8_9
A Novel Approach to Detect Power Theft in a Distribution System
91
in a typical power distribution network where such illegal activities may have been going-on by analyzing and interpreting the past consumption pattern of the consumers, and comparing the same with the current year’s consumption. A case study has been presented in the paper where six different ML algorithms were trained with historical consumption data of 100 individual domestic consumers of Calcutta Electricity Supply Corporation (CESC) Limited for the previous two years (2018–2019) and subsequently validated with the consumption data for the year 2020 (for the sake of privacy, personal details of the consumers have been deliberately suppressed in the paper). Once the ML algorithms are adequately trained to recognize the historical consumption patterns of individual consumers in different times of a year, considering seasonal, short-term and long-term variations in consumptions, including periods of festivity and other major social events, the trained ML algorithms strive to identify any major deviation in the consumption pattern in the current year. If such deviations exceed a certain threshold, as decided on the basis of analysis of the consumption data, the trained ML algorithms mark such deviations as probable cases of power theft. Review of existing research works reveal that a Smart Power Theft Detection System has been proposed by N. K. Mucheli et al. [3]. However, such an arrangement involves expensive hardware set-up involving Arduino Uno, Mobile GPS Software and may not be economically viable. In [4], ANN and SOM based pattern recognition approaches were proposed for the identification of power-theft. However, the proposed method was not compared with other standard ML algorithms. Shuan Li et al. [5], have proposed Deep Learning and Random Forest based pattern recognition for theft detection in Power Grids. While the proposed method resulted good recognition accuracy, development and training of these classifiers are time and resource consuming and difficult. The main contribution of the present paper is its relatively simple and realistic approach towards detection of power theft committed by individual consumers based on analysis of historical consumption data of consumers, which is readily available at the disposal of power distribution company and deployment of highly efficient and yet less resource consuming pattern recognition algorithms. In the present paper, six popular pattern recognition algorithms such as Decision Tree [6, 7], Random Forest [8], KNN [9], SVM [10, 11], Logistic Regression [12] and ANN [13, 14] have been used to achieve faster training and easier implementation of the proposed method on real time consumer data. The case study presented in the paper employed consumption data of randomly chosen 100 Low Tension (LT) consumers of CESC Limited, Kolkata to train six popular ML algorithms and theft recognition accuracy of these algorithms were compared.
2 The Proposed Power Theft Detection Approach As discussed, the power theft detection approach proposed in this paper is based on pattern recognition techniques, where different pattern recognition algorithms were trained with historical consumption data of LT domestic consumers of CESC Limited, Kolkata. For the sake of privacy, personal details of the individual consumers and the data set used for this work have not been revealed. However, the following sub-sections describe the procedure adapted for field data collection and pre-processing of the data.
92
A. De et al.
2.1 Field Data Collection To determine whether any theft is taking place or not, a sample of 100 domestic consumers were randomly chosen and their data were taken per quarter in a year. The consumption data for the years 2018–2019 (two preceding years) were used for analyzing the historical trend of consumption of individual consumers and the same was compared against the consumption data for the year 2020. Factors such as power-cut hours, integration of any kind of renewable source of energy have been taken into account in order to train the pattern recognition algorithms with all possible eventualities and variations which may normally be encountered. Also, the power consumed per sq. ft. area (kWh/sq.ft) has been considered to determine the expected power consumption of a household depending upon the size of the apartment. 2.2 Description of the Dataset The following attributes were considered to construct the database of 100 randomly chosen consumers: • Sl. No. of the consumer in the database: For correctly identifying the consumer committing the theft. • Quarter-wise consumption data per unit area (kWh/sq.ft.) for the years 2018, 2019 and 2020 • Quarter-wise average consumption data per unit area (kWh/sq.ft.) for the years 2018 and 2019 • Quarter-wise change (%) in consumption per unit area (kWh/sq.ft.) for the year 2020 • Quarter-wise power-cut hours (hours) and their averages for the years 2018, 2019, 2020 • Quarter-wise change in power-cut hours (hours) for the year 2020 • Quarter-wise renewable energy generation (kW) and their averages for the years 2018, 2019, 2020 • Quarter-wise change in renewable energy generation (kW) for the year 2020
3 Case Study and Results The following sub-sections describe an illustrative case study performed on a sizeable group of randomly chosen domestic electricity consumers of a Power Distribution Utility in India for the assessment of applicability of the proposed pattern recognition based power theft detection methodology.
A Novel Approach to Detect Power Theft in a Distribution System
93
3.1 Description of the Test Case As mentioned earlier, a total of 100 consumers’ data has been collected. Now out of 100 consumers or data points, 75 of these data points are trained with each of the six above-mentioned Machine Learning classifiers. After training, the trained model is tested on the remaining 25 data points to determine how good our model has been trained and how good is its accuracy in terms of Confusion Matrix. It is noteworthy that in the training dataset, information about which particular consumers are committing power theft is known a priori. When adequately trained with this dataset, the pattern recognition algorithms can recognize and learn the underlying correlations between the observed consumption patterns of a consumer and plausible power-theft. The powertheft recognition accuracy of the six pattern recognition algorithms employed in the case study have been presented in the following sub-sections. 3.2 Decision Tree Classifier Decision Tree delivered an accuracy of 96% as depicted in the Confusion Matrix of Fig. 1. It correctly classifies 24 out of the 25 sample points used for testing. The Decision Tree generated on the basis of the available dataset and the decision path for a “No Theft” case has been shown in Fig. 2. Decision Tree decided the root nodes on each branch on the basis of entropy. At each node, the entropy has been calculated and the algorithm has chosen that attribute as its feature which resulted least entropy. The entropy of a dataset is given as: E(S) = −
N
pi log2 pi (1)
i=1
where uncertainty or impurity is represented as log to the base 2 of a category pi and index i refers to the number of possible categories.
Fig. 1. Confusion Matrix for the Decision Tree Algorithm
94
A. De et al.
Fig. 2. Decision Tree describing determining non-theft case
3.3 Random Forest A total of 100 trees have been used to construct the forest. This algorithm yielded an accuracy of 88% as shown by Fig. 3. Thus 22 sample points out of the 25 data points have been correctly classified.
A Novel Approach to Detect Power Theft in a Distribution System
95
Fig. 3. Confusion Matrix for the Random Forest Classifier
3.4 K Nearest Neighbor The Minkowski distance has been considered for distance calculation among the data points and classification has been done on the basis of five nearest neighbors for each data point.
Fig. 4. Confusion Matrix for the KNN classifier
From Fig. 4 we observe KNN algorithm delivered an accuracy of 84% or in other words, it correctly classifies 21 out of 25 sample points. 3.5 Support Vector Machine While applying the SVM classifier, we have used the ‘Linear’ kernel for suitable regression. The features have been properly scaled so that there is no underfitting or overfitting. The SVM algorithm has given a high accuracy of about 96% as depicted by Fig. 5. It has correctly predicted for 24 sample points out of the 25 sample points used for testing our algorithm.
96
A. De et al.
Fig. 5. Confusion Matrix for the SVM classifier
3.6 Logistic Regression Similar to the KNN classifier, features have been scaled here to prevent any underfitting or overfitting.
Fig. 6. Confusion Matrix for the Logistic Regression classifier
As evident from Fig. 6, the Logistic Regression algorithm gave us the highest accuracy of 100%, which implies that it has correctly predicted all the theft and non-theft sample points in our data set. 3.7 Artificial Neural Network ANN is the final classifier used in our analysis. It yielded an accuracy of about 96% or it correctly predicted the result of 24 sample points out of 25. For construction of this ANN model, the Sequential Model of Keras API of Tensorflow has been used. Therefore the input layer neuron shape is taken to be 17 since our dataset has 17 attributes. The hidden layer has 25 neurons with a ‘RELU’ activation function. Since it is a binary classification problem, the output layer has only one neuron with a ‘Sigmoid’ activation function. The ANN model is trained for 50 epochs with a learning rate of 0.1. Figure 7 demonstrates the variation of accuracy with epochs for training and testing dataset.
A Novel Approach to Detect Power Theft in a Distribution System
97
Fig. 7. Accuracy versus Epoch curve for training and testing data
Fig. 8. Mean Square Error versus Epoch curve for training and testing data
The Mean Square Error (MSE) is tracked for each epoch. Figure 8 shows that the MSE is 4%. So our ANN model is quite successful in predicting theft and non-theft cases. 3.8 Comparison Among the Six Machine Learning Classifiers Table 1 presents a comparison between the six classifiers on the basis of accuracy, True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) classifications.
98
A. De et al. Table 1. Comparison among various Machine Learning classifiers
Serial No
Classifier
1
Decision Tree
2 3
Accuracy (%)
No. of test cases
TP
TN
FP
FN
96
25
18
6
0
1
Random Forest
88
25
16
6
1
2
KNN
84
25
15
6
4
0
4
SVM
96
25
18
6
0
1
5
Logistic Regression
100
25
19
6
0
0
6
ANN
96
25
18
6
1
0
4 Conclusions A novel pattern recognition based approach to identify power theft by low voltage domestic consumers in a typical power distribution network was proposed and implemented in the paper. Historical electricity consumption by household consumers for preceding two years were taken into consideration and analyzed for the detection of power theft. A test case has been presented in the paper based on consumption data of 100 randomly chosen domestic consumers of CESC Limited, Kolkata. Six different Machine Learning (ML) algorithms, such as Decision Tree, Random Forest, KNN, SVM, Logistic Regression and ANN were employed to recognize the consumption patterns of individual consumers. The trained ML algorithms efficiently recognized any kind of change in the consumption pattern of the consumers in the current year to identify events of power theft. A comparative analysis of the performance of these ML algorithms has been presented in the paper in terms of accuracy and misclassification rates. In the present test case on 100 random consumers, Logistic Regression delivered the highest theft-detection accuracy of 100%, followed by Decision Tree, SVM and ANN, each offering 96% recognition accuracy on the test data. Random Forest and KNN offered recognition accuracy of 88% and 84% respectively.
References 1. Controlling Electricity Theft and Improving Revenue. World Bank Report on Reforming the Power Sector (2010) http://rru.worldbank.org/PublicPolicyJournal. Accessed 10 Jan 2022 2. Annual Report 2011–12 of Power and Energy Division of Planning Commission, Government of India, New Delhi. http://www.planningcommission.gov.in/. Accessed 10 Jan 2022 3. Mucheli, N.K., et al.: Smart power theft detection system. 2019 devices for integrated circuit (DevIC), pp. 302–305. Institute of Electrical and Electronics Engineers (2019). https://doi. org/10.1109/DEVIC.2019.8783395 4. de Souza, M.A., Pereira, J.L.R., Alves, G.D.O., Oliveira, B.C.D., Melo, I.D., Garcia, P.A.N.: Detection and identification of energy theft in advanced metering infrastructures. Electric Power Syst. Res 182, 106258 (2020) 5. Li, S., Yinghua Han, X., Yao, S.Y., Wang, J., Zhao, Q.: Electricity theft detection in power grids with deep learning and random forests. J. Electr. Comput. Eng. 2019, 4136874 (2019)
A Novel Approach to Detect Power Theft in a Distribution System
99
6. Quinlan, R.: Learning efficient classification procedures. In: Michalski, J., Carbonell, R.S., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach. Morgan Kaufmann, pp. 463–482 (1983)https://doi.org/10.1007/978-3-662-12405-5_15 7. Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989). https://doi.org/10.1023/A:1022699900025 8. Ho, T.K. Random Decision Forests (PDF). In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, pp. 278–282, Montreal, QC, 14–16 Aug 1995. 9. Coomans, D., Massart, D.L.: Alternative k-nearest neighbour rules in supervised pattern recognition: part 1. k-Nearest neighbour classification by using alternative voting rules. Anal. Chim. Acta 136, 15–27 (1982). https://doi.org/10.1016/S0003-2670(01)95359-0 10. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https:// doi.org/10.1007/BF00994018 11. Suykens, J.A.K., Vandewalle, J.P.L.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999) 12. Menard, S.W. Applied Logistic Regression, 2nd edn. SAGE, ISBN 978-0-7619-2208-7 (2002) 13. McCulloch, W., Pitts, W.: A Logical Calculus of ideas immanent in nervous activity. Bulletin Math. Biophys. 5(4), 115–133 (1943). https://doi.org/10.1007/BF02478259 14. Dawson, C.W.: An artificial neural network approach to rainfall-runoff modelling. Hydrol. Sci. J. 43(1), 47–66 (1998). https://doi.org/10.1080/02626669809492102
Adversarial Surround Localization and Robust Obstacle Detection with Point Cloud Mapping Rapti Chaudhuri(B) and Suman Deb Department of Computer Science and Engineering, National Institute of Technology Agartala, Agartala 799046, India {rapti.ai,sumandeb.cse}@nita.ac.in
Abstract. Significant research issues and experimental possibilities on autonomous vehicles are in vogue around the world. Among these, collision-free navigation has constituted one of the significant fields to research. Along with the vigorous emergence of definite sensors, challenges are coming in new avenues. This paper proposes a keen way to detect on-route obstacles using training of model through adversarial neural network along with 3D reconstruction of a surrounding under GPS-denied Indoor Environment (IE) using point cloud map. The depth sensor used here has been systematically analysed to tackle the challenge. This paper also studies the possible challenges and hurdles faced by the customized mobile robot to create point cloud map using depth sensor integrated with Robot Operating System (ROS) platform to reconstruct the surrounding. The reconstructed visuals been used as memory trail of environment along with visual reference. This array of references would prove to be credential for path planning algorithm and promises finding near optimal collision-free Indoor Mobile Robot Navigation (IMRN). Keywords: Adversarial neural network · 3D reconstruction cloud map · Indoor Mobile Robot Navigation
1
· Point
Introduction
Contemporary innovations in Automated Guided Vehicle (AGV) navigation confirms that with the advent of sensor technology the real world has drastically converted into virtual scenario in addressing the barriers for achieving smooth path navigation by mobile robot [1]. On-route dynamic obstacles create a big concern to achieve a desired goal point along with challenges of real-time data collection for 3D surrounding environment realisation. 3D reconstruction through point cloud mapping uses effective representation of surroundings where as Machine Learning(ML) approach is included here for getting more accurate and efficient obstacle detection. This concerned work separates itself by combining ML module as prime contribution, resulting in better precision compared to existing c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 100–109, 2022. https://doi.org/10.1007/978-981-19-3089-8_10
Adversarial Surround Localization and Robust Obstacle Detection
101
works followed by module of localization of surrounding with point cloud mapping. The analysis of existing works and experimentation with single step object detection algorithms (e.g. YOLO) is found to be comparatively better in accuracy for object prediction with better probability. As a result, it has superseded other existing object identification algorithms to avoid while experimenting. Working principle and fusion with reconstruction module is presented further. In this work, Intel Realsense D455 model (Fig. 1) is used to take the input from surrounding. It consists of RGB module, 3D depth module and motion module where data from accelerometer and gyro is obtained from its inbuilt IMU (Inertial Measurement Unit). Continuous value of roll, pitch and yaw are generated on the realsense viewer interface. For onboard processing, as a single board based portation, 1.3 GHz Jetson Nano is incorporated with native ROS melodic [2]. Numerical and visual analysis of experiment has been presented briefly in subsequent sections.
Fig. 1. (a) Intel realsense D455 with its associated customed features (b) Customized robot platform (CUBOT) equipped with 2 active wheel differential drive structure and with realsense depth camera mounted on its top.
2
Related Work
Sensor proficiency has always been a dominant aspect of the surround localization. The reviews on existing system have been evolved around the geometric and algorithmic features of the acquired values that are perceived to surround model and possible obstacle identification. Researchers have extensively used the geometric least square curve fitting method [3] for obtaining near accurate localization of objects. In the work of Jin et al. [4] proposal of a keen learning based approach for obtaining safe navigation by mobile robots is noted. Exclusive combination of 3D point clouds and 2D features for obtaining relatively better object recognition is observed in the work of Filliat et al. [5]. Graph of Locations is formed by Covisibility-based methods proposed by Alcantarilla et al. [6] where the fact of 3D point clouds generated by SMF (Structure From Motion) and SLAM (Simultaneous Localization And Mapping) containing visibility information is used. In [7], an unsupervised framework for simultaneous appearance-based object discovery, detection, tracking and reconstruction is presented where RGBD cameras and a robot manipulator are used followed by dense 3D simultaneous localization and mapping. Most of these reviewed works uses
102
R. Chaudhuri and S. Deb
sensor fusion technique to achieve near accurate obstacle prediction. Incorporation of machine learning algorithm for obstacle detection with reconstruction of indoor environment is the basis of improvement proposed in this work.
3
Methodology
The work has been divided into three sections cited in Fig. 2. The first layer includes depth camera which takes input from the surrounding. These collected data are passed into the second layer for undergoing training and forming model. YOLO v4 classifies the on-route obstacles and identifies each with good precision and probability. Lastly, the mobile robot is made to explore an unknown congested indoor environment and 3D reconstruction of the environment is obtained by taking real time data from the depth camera with simultaneous obstacle detection.
Fig. 2. Proposed working procedure divided into 3 modules having customized robot model with Realsense mounted on its top collecting data from surroundings, second layer performing object identification and third layer performing 3D reconstruction of the environment.
– Intel Realsense D455. Realsense is used here for extracting 2D and 3D depth of obstacle. It consists of Processor D4 with 16Mbit serial Flash Memory for its firmware storage, Depth module with left and right - 2HD image sensors, wide Infrared Projector, RGB color sensor (1080p RGB image sensor) and Inertial Measurement Unit (IMU). – YOLO v4 for localization of objects. YOLO is a kind of Convolutional Neural Network (CNN) preferred mostly for real time object detection. YOLO divides the whole image into grid and clusters objects and performs identification by predicting multiple bounding boxes and class probabilities for each boxes. YOLOv4 (1) performs training on full images and results in optimized detection. YOLOv4 architecture mainly comprises of three distinct blocks namely backbone, neck, and head which performs dense and sparse prediction.
Adversarial Surround Localization and Robust Obstacle Detection
103
– 3D reconstruction of the environment. Localization and the surrounding can get visualized in Rviz of ROS. Rviz allows point cloud mapping and modify the voxel size. Depth images obtained from RGB-D cameras are used by Rviz for performing Graph-based SLAM, dense point cloud generation.
Algorithm 1. YOLOv4 Algorithm Input: - input, source image, enhanced data, extracted map // Input Realsense data Output: - output // Custom object detection 1: for input = source image do 2: pre-processing and enhancement of input image 3: for input = enhanced data do 4: Feed the enhanced image into CSP-Darknet53 backbone layer 5: Withdraw the feature maps 6: for input = extracted map do 7: Feed the extracted map into PANet layer for performing instant segmentation 8: Save the spatial information collected 9: Final detection of the object through Head layer of YOLOv3 structure 10: for Head Detector layer do 11: Situate perfectly the bounding boxes 12: Detect the situated bounding boxes coordinates(px , py , pw , ph ). 13: Classify each objects situated within the bounding boxes 14: Obtain aggregated loss score for each category of bounded object 2 B obs 2 2 λcoord S m=0 n=0 1mn (px − pˆx ) + (py − pˆy ) +(pw − pˆw )2 + (ph − pˆh )2 2 B obs + S m=0 n=0 1mn [(−log(σ(po )) C + k=0 BCE(yk , σ(Sk ))] 2 B noobs +λnoobs S m=0 n=0 1mn [−log(1− σ(po )] 15: output = optimized detection // Optimization of detection by CIoU loss freebie
Fig. 3. (a) Depth module of realsense viewer giving depth view of the surrounding (b) Accelerometer data obtained during movement of mobile robot with realsense mounted on its top (c) Gyro data obtained presenting angular velocity of the mobile robot.
104
4
R. Chaudhuri and S. Deb
Experiment and Result Analysis
– Input from depth sensor. Data from realsense is obtained in coordinate and depth format (Fig. 3). Table 1 depicts the data fields obtained from D455 depth camera with respective features.
Fig. 4. Real-time on-route obstacles identification by YOLO v4 during movement of customized robot with realsense mounted on its top. Table 1. Data obtained by operating Intel realsense depth sensor on the indoor environment showing the respective features. Features of Remarks data obtained from realsense RGB image:
Original real time view of the environment is obtained on RGB module
Depth data:
Streaming of depth frames is done by realsense where the depth data is stored in pixels in z16 format. Depth data are decoded using python commands converting pixel into matric units
Accelerometer data:
Real time vibration or acceleration value is obtained from realsense viewer with the movement of mobile robot along with realsense sensor
Gyro data:
Run time measurement of rotational motion is achieved measuring angular velocity of the sensor in revolutions per second
3D point cloud:
Live 3D point cloud generation of the environment is obtained from realsense viewer
– Object identification. Obstacle Detection is an integral part of the safe path planning and traversal of autonomous robots. Howbeit, long–distance tracking in motion–blur environments and highly variable illuminations can lead to unpredictable shifts and cumulative errors. YOLO v4 (Algorithm 1) has been eventually applied to on-route obstacles for precise detection (Fig. 4). – 3D reconstruction using point cloud mapping. The reconstruction been analysed in the following steps.
Adversarial Surround Localization and Robust Obstacle Detection
105
i) Point cloud map: Extensive use of Point clouds is noticed in modern 3D vision tasks including robot navigation [8], autonomous driving [9,10], object pose estimation [11,12], etc. Partial point clouds generally creates structural losses of the object shapes (Fig. 5). This often leads to the difficult scene perceive and taking decision as presented in Fig. 8(b). Early approaches like Point Feature Histograms (PFH) [13] series descriptors takes local geometry of point clouds in consideration with mutual distances and normal vectors. Applicability of this method is comparatively limited. These approach embed point sets into voxels [14] or depth images [15] followed by CNN. Second approach is learning based which extracts features through neural network. This process proved to be more efficient in case of classification, registration [16], segmentation [17–19] and detection [9,18].
Fig. 5. Unsampling of sparse point cloud to dense point cloud with original view taken as reference.
Fig. 6. (a) Calibration of intrinsic camera parameters showing camera coordinate, image coordinate and line of camera for getting perfect angle with the target (b) Extrinsic camera calibration showing coordinates explaining rotation matrix followed by translation matrix for getting the final point.
ii) Camera calibration: A set of tools and desired environment condition are used for the purpose of recalibration of depth cameras back to their pristine factory condition. librealsense library is used for necessary camera calibration (Fig. 6). The depth cameras is used to detect a flat target, measured depth is achieved with respect to a small section of Field of View (FOV), ROI (Region of Interest) of which is 10%–20% near the center. The RMS Depth noise is calculated from the plane where the ROI depth map is fitted. The subpixel RMS value can be calculated from Eq. 1. sp(p) =
f l(p) ∗ Bl ∗ D RM S err D2
(1)
106
R. Chaudhuri and S. Deb
where sp = subpixels, fl = focal length, p = pixels, Bl = baseline in mm, D RMS err = depth RMS error in mm, D = Distance in mm. Equation to find focal length is mentioned in Eq. 2. f ocallength(pixels) = 1/2
Xres (pixels) tan( HF2OV )
(2)
Steps of Offline and online calibration approaches are mentioned in the form of schematic diagram in Fig. 7. (a) Offline calibration. Extrinsic parameter matrices are obtained between initial frame F0 and the nt h frame Fn via camera calibration evident from Eq. 3. n n Rr tr Qr Qn n Qr = = Tr (3) 1 01×3 1 1 1 Coordinate system is formed by taking different view of the environment, where Qr is the coordinate of the key point on the environment [20]. Trn represents the transformation matrix to map Qr to Qn . Considering F0 , the world coordinate w, Qn can be transformed to F0 presented in Eq. 4, Eq. 5. (Rrn )T Qn − (Rrn )T tnr = (Rr0 )T Q0 − Rr0 (Rrn )T tnr + t0r
(4)
=⇒ Rn0 Qn + t0n = Rr0 (Rrn )T Qi − Rr0 (Rrn )T tnr + t0r
(5)
where, Rn0 = Rr0 (Rrn )T is the rotation matrix for transformation of Qn to the F0 coordinate system. Translation vector t0n = −Rr0 (Rrn )T tnr + t0r . For each view, transformation matrix is obtained as Eq. 6.
Fig. 7. (a) Schematic diagram showing steps and processes involved in offline camera calibration (b) Steps involved in online camera calibration.
Tn0
Rn0 t0n = 01×3 1
(6)
Adversarial Surround Localization and Robust Obstacle Detection
107
Point clouds formed on Rviz can be denoted as Q0n−1 and Q0n , where Q0n−1 is initial point and Q0n is the goal point. Refinement transformation can be obtained as Eq. 7. (i)
(n)
Q0n−1 = R0 Q0n + t0
(7)
R0 and t0 are the nth refinement rotation matrix and the nth refinement translation vector respectively. Final equation obtained as mentioned in Eq. 8. (n)
(n)
(n)
(n)
(n)
Q0n−1 = R0 Rn0 Qn + R0 t0n + t0
Aligned transformation errors can be obtained as Eq. 9. (n) (n) (n) R0 Rn0 R0 t0n + t0 Tˆn0 = 01×3 1
(8)
(9)
(b) Online calibration. Transformation matrix obtained from offline calibration is stored for online operation [20]. Point cloud Q1 of F1 if transformed to F0 , formula used is mentioned in Eq. 10. Q01 = T10 Q1
(10)
Capturing all views, the initial reconstructed 3D scene model can be obtained as Eq. 11. Qw = {Q0 , Q01 , Q02 , ...., Q0N } (11) N is the number of views captured by the camera. iii) 3D recreation of environment: Under ROS environment, reconstruction of various indoor environment is presented on Fig. 8, formed with the accumulation of set of data points on Rviz. Each point of a specific area is constituted of set of cartesian coordinates (X, Y, Z). realsense2 camera library of ROS is used consisting of dependency librealsense2.
Fig. 8. (a) 3D reconstruction of scene 1 using dense point cloud mapping (b) Sparse 3D reconstruction of scene 2 (c) 3D reconstruction of an indoor environment accompanied with obstacle detection on Rviz visualizer.
The camera node on Rviz realsense2 camera library is launched and point cloud filter is applied.
108
5
R. Chaudhuri and S. Deb
Conclusion and Future Scope
This work been envisioned to achieve detection of on-route static obstacles for indoor point to point robot navigation along with for optimum path in finite amount of time. To achieve the objective of the paper YOLO v4 has been extensively explored for robust and rapid identification of obstacles. Integration of machine learning module along with 3D reconstruction of the environment with ROS platform presents path exploration and localization in much more realisable form. This work is presented to explore indoor environment with multiple solution that may be the ready reference for smooth navigation. 3D reconstruction enable image processing and image exploration leading to the kinematic control and teleport the memory of one robot exploring certain environment with other cooperative AGV.
References 1. Cui, Y., et al.: Deep learning for image and point cloud fusion in autonomous driving: a review. IEEE Trans. Intell. Transp. Syst. 23(2), 722–739 (2022) 2. Valladares, S., Toscano, M., Tufi˜ no, R., Morillo, P., Vallejo-Huanga, D.: Performance evaluation of the Nvidia Jetson Nano through a real-time machine learning application. In: Russo, D., Ahram, T., Karwowski, W., Di Bucchianico, G., Taiar, R. (eds.) IHSI 2021. AISC, vol. 1322, pp. 343–349. Springer, Cham (2021). https:// doi.org/10.1007/978-3-030-68017-6 51 3. Singh, M., Bhoi, S.K., Panda, S.K.: Geometric least square curve fitting method for localization of wireless sensor network. Ad Hoc Netw. 116, 102456 (2021) 4. Jin, S., Meng, Q., Dai, X., Hou, H.: Safe-NAV: learning to prevent PointGoal navigation failure in unknown environments. Complex Intell. Syst. 1–18 (2022) 5. Filliat, D., et al.: RGBD object recognition and visual texture classification for indoor semantic mapping. In: 2012 IEEE International Conference on Technologies for Practical Robot Applications (TePRA), pp. 127–132. IEEE (2012) 6. Lynen, S., et al.: Large-scale, real-time visual-inertial localization revisited. Int. J. Robot. Res. 39(9), 1061–1084 (2020) 7. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004) 8. Jenkins, M., Kantor, G.: Online detection of occluded plant stalks for manipulation. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5162–5167. IEEE (2017) 9. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019) 10. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018) 11. Sock, J., Hamidreza Kasaei, S., Lopes, L.S., Kim, T.-K.: Multi-view 6D object pose estimation and camera motion planning using RGBD images. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2228–2235 (2017)
Adversarial Surround Localization and Robust Obstacle Detection
109
12. Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3343–3352 (2019) 13. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: 2009 IEEE International Conference on Robotics and Automation, pp. 3212–3217. IEEE (2009) 14. Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for realtime object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 922–928. IEEE (2015) 15. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015) 16. Gecer, B., Ploumpis, S., Kotsia, I., Zafeiriou, S.: GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1155– 1164 (2019) 17. Jiang, M., Wu, Y., Zhao, T., Zhao, Z., Lu, C.: PointSIFT: a sift-like network module for 3D point cloud semantic segmentation. arXiv preprint arXiv:1807.00652 (2018) 18. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep Hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9277–9286 (2019) 19. Yu, W., Zhang, Z., Zhong, R., Sun, L., Leng, S., Wang, Q.: Densely connected graph convolutional network for joint semantic and instance segmentation of indoor point clouds. ISPRS J. Photogramm. Remote. Sens. 182, 67–77 (2021) 20. Tsai, C.-Y., Huang, C.-H.: Indoor scene point cloud registration algorithm based on RGB-D camera calibration. Sensors 17(8), 1874 (2017)
Perceptive Analysis of Chronic Kidney Disease Data Through Conceptual Visualization P. Antony Seba(B) and J. V. Bibal Benifa Department of Computer Science and Engineering, Indian Institute of Information Technology, Kottayam, Kerala 686635, India {sebaantony.phd201002,benifa}@iiitkottayam.ac.in
Abstract. The primary objective is to investigate the data pertaining to Chronic Kidney Disease (CKD) towards severity prediction with detailed statistical analysis through conceptual visualization. The aim is met by investigating the types of missingness and outliers for further analytics. A well-defined CKD dataset is used for perceptive analysis and visualisation. The feature “estimated Glomerular Filtration Rate” is extracted to predict the severity of the disease. The data leakage problem is minimized using stratified split. Normality in attributes is tested using Shapiro-Wilk Test and it is achieved by power transformation to enable building accurate predictive models. Optimal number of relevant features are selected using supervised feature selection algorithms. The results of the perceptive analysis, viz., observed data during imputation and detection of outliers i.e., skewness, threshold values, p-value and other statistical reports are presented. Classifier models have been built after perceptive analysis and the CKD stage prediction results are also reported. Keywords: CKD dataset · Feature extraction · Imputation · Outliers · Data visualisation · Features selection · Classifiers
1 Introduction The sudden growth in the collection of medical data over a decade in the form of clinical test reports and electronic health records triggered inconsistency and contradiction while performing intensive data analysis. Data in its raw form may not be readily available for analysis and model building. Data represent certain characteristics, when set some context can extract information which helps to draw insights and to provide solutions to any real-world problem through analytics. Validating the quality of the data through statistical analysis [1, 2] helps to choose appropriate models for predictive analytics and the data visualisation techniques induct good observations about the data. Statistical facts about the data are very important to draw conclusions in analytics. Poor quality data leads to wrong insights as well as less accuracy in predictions. Exploratory data analysis [3, 4] is a philosophical data analysis approach that provides enormous graphical techniques to optimize the perception of the dataset of any application domain, to detect the far points and anomalies and to extract important variables for effective model building. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 110–122, 2022. https://doi.org/10.1007/978-981-19-3089-8_11
Perceptive Analysis of Chronic Kidney Disease Data
111
In classical data analysis, once the data pre-processing is done, the model is built and then analysis is carried out for making appropriate decision or prediction. Whereas, in exploratory data analysis, the pre-processed data has been completely analysed before model has been built. A detailed perceptive analysis of CKD data is presented in this work. The data has been tuned perfectly thus making it ready for appropriate model building and for further analysis. Machine Learning, which uses statistical techniques to learn expects pure and precise data during learning process so as to produce accurate prediction as well as better solutions during test process. The research scopes are wide in investigating the CKD dataset obtained from UCI repository due to the existence of non-normality of variables, missing values, redundant features, irrelevant features and class imbalance [5–10]. Descriptive statistics describes the properties of the CKD dataset through measures of central tendency whereas inferential statistics draws conclusions from a sample and generalizes them to the population through hypothesis tests. Variables with approximately normal distribution and if feature scales are similar, then the models yield better accuracy.
2 Related Work Several research works are reported in the field of data science to understand the structure of data, to handle heterogeneous types of data from different sources, pre-process the data by proper structuring and prepare the data for predictive analytics through statistical approach. Data has been studied through data visualization using plots and graphs. Jiri Dvorak et al. [11] proposed a visualization tool “Clover plot in R” to assess the efficiency of nonparametric classification techniques by combining several classifiers. The plot also detects the outliers and skewness. The relationship among variables is important in feature selection to build efficient machine learning models. Changwon Yoo et al. [12] introduced the concept of linear and logistic regression to understand the relationship among variables and statistical Bayesian networks in clinical data analysis. Na Cui et al. [13] proposed Complementary Dimension Analysis algorithm, which is an objective function to quantify the classification accuracy and present an algorithm to retrieve directions sequentially from the nonlinear objective function. Janus et al. [14] recommended analytical approaches, which may prevent bias caused by unavoidable missing data by considering the strengths and limitations of best-worst and worst-best sensitivity analyses. Asghar Ghasemi et al. [15] presented an overview of normality checking. The normality is visualized through stem-and-leaf plot, boxplot, P-P plot (probabilityprobability plot), and Q-Q plot (quantile-quantile plot) and can be tested through null and alternate hypotheses using Kolmogorov-Smirnov, Shapiro-Wilk, Anderson-Darling, Cramer-von Mises, D’Agostino-Pearson Omnibus and Jarque-Bera tests. The authors have recommended Shapiro-wilk test towards testing of normality since it is based on correlation between the data. Yongbo et al. [16] explored feature transformation in cardio vascular data to get better classification performance. After transformation [17], relevant features are selected to build models using different algorithms. In this work, CKD dataset is analysed and visualised to get insightful observations, the decisive feature is extracted to predict the severity of the disease, the data has been tuned and each attribute is normalised thus making the data ready to use for efficient model building.
112
P. Antony Seba and J. V. Bibal Benifa
3 Proposed Methodology In medical data analytics, the nature of data has to be studied carefully, sensibly to be smoothened for better accuracy, and best practices like addressing kind of missingness and data leakage are to be addressed effectively. The attributes of each instance and their importance in predicting the disease are to be investigated to build efficient machine learning models. The purpose of perceptive analysis of CKD dataset is to gather insightful observations to the maximum, which are useful to extract new features to analyse the trends and for effective prediction. The CKD dataset is well studied through data visualization before and after feature extraction. Further, it is proposed to carry out listing the outliers and to discover the types of missingness for proper imputation. Due to medical relevancy, the instances with extreme values are also being considered in model building. It is observed that the raw CKD dataset is distorted due to the presence of measurement errors and missing values. About 60% of the instances are having missing values. The relationship of other variables towards the target variable has been observed and the relevant features are selected in their order of merit and based on their statistical significance. The Shapiro-Wilk normality test on raw CKD dataset is failed i.e. the null hypothesis is rejected and data transformation is required to make it fit in a Gaussian distribution. Graphical and visualization techniques have been adopted to analyse the statistical importance of data including what the data is meant for and what does not it mean and steps are taken to bring the data sample more normal. Yeo-Johnson transformation is adopted to bring normality in each variable and the data is prepared for training through stratified split and supervised feature selection algorithms have been utilized to extract the most relevant features for model building. Multinomial Logistic Regression (MLR), Naïve Bayes (NB), Random Forest (RF), Ada Boost (AB) and Decision Tree (DT) classifiers have been built after detailed perceptive analysis on CKD dataset and the accuracy of severity prediction by each model is compared. 3.1 Data Collection The benchmark CKD dataset obtained from University of California Irvine (UCI) machine learning repository [18] has 400 instances with 11 numeric and 14 categorical variables to predict the presence of the disease either “ckd” or “notckd”. The independent variables in the dataset are quantitative and qualitative and the dependent target variable is binomial. In statistical data analysis, the continuous data are represented as probability density function, whereas, the discrete data are represented as probability mass function. The independent variables are interpreters to predict the values of the dependent variable. 3.2 Feature Extraction Every medical dataset has to be enhanced with derivation of appropriate new features those are more prone to final decision. The raw CKD dataset offers only the presence of disease {“notckd”, “ckd”}. To monitor the severity progression of the chronic kidney disease and to predict the stage accurately, some additional relevant information is required either from the clinical test reports or from the patient’s health records history and based
Perceptive Analysis of Chronic Kidney Disease Data
113
on demographic conditions. Extracting such additional information enhances the dataset to identify the hidden patterns or trends. It is stated in the KDIGO guidelines [19] that the prominent feature to decide about the severity of CKD is “estimated Glomerular Filtration rate (egfr)”, which is estimated using Modification of Diet in Renal Disease (MDRD) Eq. (1). This new feature egfr is related to the observed data and specifically it has relations with the attributes age, sc, race and gender. The demographic variables, gender and race are populated and the egfr is estimated using Eq. (1). eGFR = 175 ∗ (sc)−1.154 ∗ (age)−0.203 ∗ gender_condition ∗ race_condition
(1)
0.742, if female 1.212, if black and race_condition = 1, if male 1, if others The target attribute “class” of each instance in the enhanced CKD dataset is labelled using ordinal values from the set {stage1, stage2, stage3, stage4, stage5} against {notckd, ckd} as per KDIGO guidelines to make decisions on the severity level. The CKD dataset is enhanced with newly populated features {gender, race} and an extracted feature {egfr} including the multiclass {class} attribute. In the enhanced CKD dataset, the class distribution is as follows: CKD - stage1 instances are 25.25%, stage2 instances are 19.5%, stage3 instances are 22.75%, stage4 instances are 14.75% and the stage5 instances are 17.75%. where, gender_condition =
3.3 Data Pre-processing Data pre-processing is the preliminary process in data analysis in which data cleaning is done through handling missing data by considering the kind of missingness, distribution and the types of the variables. If the enhanced CKD dataset is split as training and test datasets randomly, it will lead to imbalanced distribution of classes (i.e., CKD stages) which affects the performance of the model for the minority classes. Therefore, the dataset has been split in a stratified way (in the ratio of 70:30) to maintain equal distribution of classes. The training dataset (70%) and the test dataset (30%) are to be pre-processed separately to avoid data leakage. If they are not pre-processed separately, then information outside of the training set is used to build the model that will reduce the production environment accuracy. Addressing the data leakage is one of the key components in the perceptive analysis of CKD dataset to get a robust predictive model [20]. 3.3.1 Outlier Detection The presence of outliers negatively influences any machine learning model and they are identified through univariate analysis [22]. Global outliers are present in the dataset. Since the attributes are skewed, Inter Quartile Range (IQR) is preferred to identify the outliers. The CKD dataset has 34.5% of data instances with outliers. The skewness as shown in Table 1 may be due to missing values and the presence of outliers and hence they are addressed statistically. IQR is a measure of variability and the data is divided into three quartiles Q1, Q2 and Q3. IQR is the difference between the third quartile and the first quartile and it is considered as the region where the density of the data is
114
P. Antony Seba and J. V. Bibal Benifa
more. The lower fence and the upper fence values are shown in Table 1. The data points which lie below the lower fence and above the upper fence are considered as outliers in many use cases and they are removed. All variables are skewed in the dataset and the healthcare data need special handling in outliers [22]. Table 1. Skewness, lower fence and upper fence values Sl. No
Variables
Skewness
Lower Fence
Upper Fence
1
age
−0.654868
8.25
98.25
2
bp
2.054112
55
95
3
bgr
1.813749
3
259
4
bu
2.971817
−31.5
124.5
5
sc
6.135109
−1.95
5.65
6
sod
−0.613384
124.5
152.5
7
pot
9.896213
2.15
6.55
8
hemo
−0.391718
3.25
22.05
9
pcv
-0.536475
12.5
64.5
10
wc
1.786081
1550
14750
11
rc
−0.186372
1.65
7.65
12
gfr
1.410340
−86
194
In medical data analysis, usually the values of the attributes beyond the normal range help in diagnosing or predicting the presence of disease. From Table 1, it is evidence that the lower and upper fence values of the 12 numerical attributes are within the normal range and hence yield less contribution towards severity prediction of chronic kidney disease. Consequently, all the data instances are considered in model building even though the outliers are identified in the CKD dataset. 3.3.2 Imputation Missing values are due to human errors and there are three kinds of missing data: i. Missing At Random (MAR) i.e., the missing data is related to the observed data therefore other variables in the dataset are used to predict a missing value. ii. Missing Completely At Random (MCAR) i.e., missing data is independent of the observed data and there exist no relationship between the missing value and the other variables. iii. Missing Not At Random (MNAR) i.e., the missing data are related to the unobserved data. In all cases, missingness is visualised through heatmap and dendrogram. In addition, MAR is also visualised through scatter plot by showing the relationship between two variables. Different techniques are followed to handle the missing values [9] based on the kind of missingness as shown in Table 2. The missing values of various types in the enhanced CKD dataset are tabulated in Table 3.
Perceptive Analysis of Chronic Kidney Disease Data
115
Table 2. Missingness and imputation methods Types of Missing Values
Techniques
MCAR
List-wise deletion Pair-wise deletion Mean-Median-Mode
MAR
Mean-Median-Mode
MNAR
Machine Learning Models
Multiple imputations
Table 3. Variables with missingness Type of Missing Values
Variables in CKD Dataset
MNAR
{pcc, ba},{sod, pot},{htn, dm, cad},{appet, pe, ane}
MCAR
{rbc, pc, bgr, age, bp}
MAR
{wc,rc}{hemo,pcv}{bu,sc}{al,su,sg}
The missing values of type MNAR are subjected to KNN with K = 5, MCAR and MAR are subjected to mean-median-mode imputation. If the correlation coefficient of the variables is 1, then it means the variables exhibit same missing pattern and hence they are grouped as MNAR. If the correlation coefficient of the variables is 0, then there is no relationship among the variables and the kind of missingness is MCAR. If the correlation coefficient of the variables is less than 1, then there exists relationship among the variables and the kind of missingness is MAR. They are grouped based on the same coefficient value which is less than 1. 3.4 Data Visualisation Data analysis is the preliminary investigation to be carried out on CKD data to identify the factors that cause the progress of CKD through patterns, anomalies, statistical information and graphical representations of the dataset. Summary statistics, univariate and bivariate analysis are done to know the importance of each independent variable in causing the severity of the disease through insights and observations. Before feature extraction, the CKD data analysis reveals that as age increases the chance of likely to get CKD is more as shown in Fig. 1 (a). Figure 1 (b) reveals the same after feature extraction. By carrying out bivariate analysis between age and haemoglobin (hemo) and between diabetes melitus (dm) and hemo, it is evident that the factors hemo, coronary artery disease (cad) and dm are the root causes of CKD in spite of age [21] as observed from the Fig. 2 (a), (b) and Fig. 3 (a), (b) respectively. The variables {sod, htn, bp, cad} are related variables in healthcare. The other related variables are {sg, dm, al, su, bgr}, {pc, pcv, ba} and {rbc, hemo, pcv, rc, anemia}. The
116
P. Antony Seba and J. V. Bibal Benifa
Fig. 1. (a) age vs class. (b) age vs stages
Fig. 2. (a) age vs hemo. (b) age vs hemo
Fig. 3. (a) gfr vs cad. (b) dm vs hemo
features gfr and sc are inversely proportional as shown in Fig. 4 (a). The linear relationship between pcv and hemo is shown in Fig. 4 (b). A healthy cardiovascular functioning is important for the kidneys to function. Diabetes and high blood pressure are the common risk factors for heart and kidney diseases as shown in Figs. 5 (a) and 5 (b). The data visualisation reveals the nature of each variable in the CKD dataset which are more helpful in preparing the data for model building. 3.5 Hypothesis Testing A normality test is carried out to determine whether the data has been drawn from a normally distributed population. If the data are normal, then the conclusions drawn
Perceptive Analysis of Chronic Kidney Disease Data
117
Fig. 4. (a) sc vs gfr. (b) pcv vs hemo
Fig. 5. (a) dm vs age. (b) bp vs age
from a sample can be generalised to the population. The variables in CKD dataset are skewed and the variance is high. If the variables are normally distributed, then 3 standard deviations away from mean covers 99.7% of the variables and they are dense. If the values of the variables are dense, then the machine learning models perform well. The normality of a variable can be visualised through histograms, boxplot, QQ plot and normal probability plot. Outliers may cause non normality. If there are no outliers and the variables are non-normal then it is preferred to do transformation to make the variables normal. Inferential statistics is applied to CKD dataset to test the distribution of the numerical variables and the null hypothesis (H0 ) is used to verify that the data distribution is normal. The alternate hypothesis (HA ) is used to verify that the data is non normal. The null hypothesis is accepted if the estimated p-score value is > 0.05 otherwise it is rejected and data transformation is preferred to bring the variables towards normality. Shapiro-Wilk test is used to test the null hypothesis as shown in Eq. 2. n w = n
i=1 ai xi
i=1 (xi
2
− x)2
(2)
where x i is the ith order statistic, x is the sample mean. The coefficients ai are given 1/2 T −1 by (ai , ..., an ) = m CV , where c = V −1 m = mT V −1 V −1 m and the vector T m = (m1 , ..., mn) . H0 : Variable is normal, HA : Variable is not normal. Shapiro-Wilk hypothesis testing is carried out to test the normality of the variables and the estimated p-score values are less than 0.05 for all the variables and hence the
118
P. Antony Seba and J. V. Bibal Benifa
null hypothesis is rejected and it is evident from QQ plot shown in Fig. 6. The Table 4 shows the W value of each numeric variable of the CKD dataset.
Fig. 6. (a) QQ Plot for age. (b) QQ plot for hemo. (c). QQ plot for sc
Table 4. Shapiro-wilk test values (W) for all numeric variables Sl. No
Variables
W
1
age
0.955
2
bp
0.855
3
bgr
0.743
4
bu
0.710
5
sc
0.622
6
sod
0.933
7
pot
0.968
8
hemo
0.981
9
pcv
0.962
10
wc
0.847
11
rc
0.933
12
gfr
0.884
W is the Shapiro-Wilk test statistic value, which ranges between 0 and 1. Smallest W value is the worst fit of the variable to normal distribution. Due to rejection of null hypothesis, the variables are transformed to normal distribution. 3.6 Transformation Quality data is required for model building and the quality is ensured through data preparation. Transformations are based on the current distribution of the variables. Right skewed data uses square root, logarithmic and reciprocal transformation. For Left skewed data, the data has to be reflected and the appropriate transformation used for right skewed is applied. Power transformation is used to stabilize the variance. The transformations are used to make the data to fit in normal distribution to achieve better performance in predictive analytics. In this research work, Yeo and Johnson transformation [17] is used
Perceptive Analysis of Chronic Kidney Disease Data
119
as shown in Eq. (3) and it permits for zero and negative values to reduce the skewness thus improving the normality. The accuracy and efficiency of classification for high dimensional data have been improved by linear transformation. ⎧ ⎫ if λ = 0, y ≥ 0 ⎪ (y + 1)λ − 1 /λ ⎪ ⎪ ⎪ ⎨ ⎬ log(y if λ = 0, y ≥ 0 + 1) 2−λ (3) ψ(λ, y) = ⎪ − 1 ]/(2 − λ) if λ = 2, y < 0 ⎪ − (−y + 1) ⎪ ⎪ ⎩ ⎭ − log(−y + 1) if λ = 2, y < 0 Due to rejection of null hypothesis, the variables are transformed to normality using YeoJohnson transformation. The Shapiro Wilk test is performed again on the transformed data to confirm for the normality and also the transformed data is visualised through QQ plot as shown in Fig. 7.
Fig. 7. (a) QQ plot for age. (b) QQ plot for hemo. (c) QQ plot for sc
3.7 Feature Selection Feature selection is the next step to reduce the number of independent variables and hence to reduce the cost of model building and to improve the performance of the model. The different feature selection techniques are supervised and unsupervised. The supervised feature selection methods are of three types wrapper, filter and intrinsic as shown in Table 5. Unsupervised feature selection techniques remove redundant variables without considering the target variable whereas supervised feature selection techniques remove the irrelevant variables by considering the target variable. The wrapper feature selection method constructs many models with different sub feature sets and select the model with highest cross validation accuracy. Filter feature selection methods use statistical techniques to calculate the relationship score between each input variable and the target variable, and these scores are used to filter the input variables to be used in the model. Intrinsic feature selection methods use machine learning algorithms which has feature selection process as a part of the model. The supervised feature selection methods Recursive Feature Elimination (RFE), Analysis of Variance (ANOVA) and Extra Tree Classifier (ETC) are used and the top ten ranked attributes by each feature selection algorithm are shown in Table 6. In this work, the dataset is well studied through data visualization before and after feature extraction and it is observed that the risk factors causing CKD are diabetes and
120
P. Antony Seba and J. V. Bibal Benifa Table 5. Feature selection algorithms
Feature Selection Method
Algorithms/Methods
Wrapper
Forward Feature Selection, Backward Feature Elimination, Exhaustive Feature Selection, Recursive Feature Elimination
Filter
Mutual information, Pointwise mutual information, Chi-squared, Pearson, Spearman, Kendall, ANOVA, Fisher score
Intrinsic
Decision Trees, Multivariate adaptive regression spline (MARS) models, Regularization models
Table 6. Top 10 attributes Feature Selection Algorithm
Top 10 attributes in the order of relevance
ETC
gfr, sc, bu, pcv, age, htn, bp, sod, hemo and al
ANOVA
gfr, sc, bu, hemo, htn, pcv, rc, dm, age and ane
RFE
gfr, sc, age, bu, bp, gender, rc, race, al and rbc
haemoglobin and the insights are poor diet and change in living style. Statistical analysis is done to prepare the data for better predictions.
4 Results and Discussion The state-of-the-art Multinomial Logistic Regression, Random Forest, Ada Boost, Naïve Bayes and Decision Tree classifiers are used to build the models using the relevant feature sets ranked by ETC, ANOVA and RFE and the performance of each model in terms of severity prediction accuracy is shown in Table 7. Table 7. Performance of the models Classifiers
Accuracy (in percentage) ETC
ANOVA
RFE
MLR
80.67
78.15
73.94
Random Forest
95.79
94.95
96.63
Ada Boost
89.07
61.34
61.34
Naïve Bayes
84.03
79.83
85.71
Decision Tree
97.47
95.79
95.79
Perceptive Analysis of Chronic Kidney Disease Data
121
5 Conclusion Perceptive analysis is carried out on the CKD dataset to bring utmost normalcy in all its attributes and hence the data has been prepared for effective model building and for better predictions. Missing values are handled based on the type of missingness, normality is tested through the Shapiro-Wilk test and the null hypothesis is rejected. About 60% of the instances are identified as having missing values and properly handled. The extreme far points are identified but not removed as they are more relevant in predicting the presence of chronic kidney disease. Since it is a medical dataset, it is highly recommended by the nephrologist not to remove the data instances having far points to make the models work better. The skewness present in the data is corrected using Yeo-Johnson transformation to bring normality. Based on the relevant features selected using ETC, ANOVA and RFE, five classifier models have been built and it is observed that the relevant feature set given by ETC yields better accuracy in predicting the severity of the disease.
References 1. Thomas, R., Vetter, F.: Descriptive statistics: reporting the answers to the 5 basic questions of who, what, why, when, where and a sixth, so what? Anesth Analg 125(5), 1797–1802 (2017). https://doi.org/10.1213/ANE.0000000000002471 2. Scot, H., Simpson, F.: Creating a data analysis plan: what to consider when choosing statistics for a Study. The Canadian J. Hosp. Pharm. 68(4), 311–317 (2015). https://doi.org/10.4212/ cjhp.v68i4.1471 3. Ho, C., Yu, F.: Exploratory data analysis in the context of data mining and resampling. Int. J. Psychol. Res. 3(1), 9–22 (2010). https://doi.org/10.21500/20112084.819 4. Hassan, N.J., Hawad Nasar, A., Mahdi Hadad, J.: Distributions of the ratio and product of two independent weibull and lindley random variables. J. Probab. Statis. 2020 (2020). https:// doi.org/10.1155/2020/5693129 5. Jinan Fiaidhi, F.: Envisioning insight-driven learning based on thick data analytics with focus on healthcare. IEEE Access 8, 114998–115004 (2020). https://doi.org/10.1109/ACCESS. 2020.2995763 6. Tsai, C.-W., Lai, C.-F., Chao, H.-C., Vasilakos, A.V.: Big data analytics: a survey. Journal of Big Data 2(1), 1–32 (2015). https://doi.org/10.1186/s40537-015-0030-3 7. Raghavan, S.R., Ladik, V., Meyer, K.B.: Developing decision support for dialysis treatment of chronic kidney failure. IEEE Trans. Info. Technol. Biomedi. 9(2), 229–238 (2005). https:// doi.org/10.1109/titb.2005.847133 8. Tshering, S., Okazaki, T., Endo, S.: A method to identify missing data mechanism in incomplete dataset. Int. J. Comp. Sci. Netw. Sec. 13(3), 14–22 (2013) 9. Fielding, S., Fayers, P.M., McDonald, A., McPherson, G., Campbell, M.K.: Simple imputation methods were inadequate for missing not at random (MNAR) quality of life data. Health Qual Life Outcomes 6 (2008). https://doi.org/10.1186/1477-7525-6-57 10. Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Advances in Bioinformatics 2015 (2015). https://doi.org/10.1155/2015/ 198363 11. Dvorak, J., Hudecova, S., Nagy, S.: Clover plot: versatile visualization in nonparametric classification. Stat. Analy. Data Mining: The ASA Data Sci. 13(6), 525–572 (2020). https:// doi.org/10.1002/sam.11481
122
P. Antony Seba and J. V. Bibal Benifa
12. Yoo, C., Ramirez, L., Liuzzi, J.: Big data analysis using modern statistical and machine learning methods in medicine. Int. Neurol. J. 18(2), 50–57 (2014). https://doi.org/10.5213/ inj.2014.18.2.50 13. Cui, N., Hu, J., Liang, F.: Complementary dimension reduction. Stat. Analy. Data Mining: The ASA Data Sci. 14(1), 31–40 (2020). https://doi.org/10.1002/sam.11484 14. Jakobsen, J.C., Gluud, C., Wetterslev, J., Winkel, P.: When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts. BMC MED RES METHODOL 17 (2017). https://doi.org/10.1186/s12874-0170442-1 15. Ghasemi, A., Zahediasl, S.: Normality tests for statistical analysis: a guide for nonstatisticians. Int. J. Endocrinol Metab. 10(2), 486–489 (2012). https://doi.org/10.5812/ijem. 3505 16. Liang, Y., Hussain, A., Abbott, D., Menon, C., Ward, R., Elgendi, M.: Impact of data transformation: an ECG heartbeat classification approach. Frontiers in Digital Health (2020). https:// doi.org/10.3389/fdgth.2020.610956 17. Yeo, I.-K., Johnson, R.A..: A new family of power transformations to improve normality or symmetry. Biometrika 87(4), 954–959 (2000). https://doi.org/10.1093/biomet/87.4.954 18. https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease. Accessed on 14 Jan 2022 19. https://kdigo.org/wp-content/uploads/2018/08/KDIGO-Txp-Candidate-GL-Public-ReviewDraft-Oct-22.pdf. last accessed on 14 Jan 2022 20. Saravanan, N., Sathish, G., Balajee, J.M.: Data wrangling and data leakage in machine learning for healthcare. J. Emerg. Technol. Innov. Res. 5(8), 553–557 (2018) 21. Zeng, X.X., Liu, J., Ma, L., Fu, P.: Big data research in chronic kidney disease. Chin. Med. J. 131(22), 2647–2650 (2018). https://doi.org/10.4103/0366-6999.245275 22. Xu, X., Liu, H., Li, L., Yao, M.: A comparision of outlier detection techniques for highdimensional data. Int. J. Comput. Intel. Sys. 11(1) 652–662 (2018). https://doi.org/10.2991/ ijcis.11.1.50
Islanding Detection in Microgrid Using Decision Tree Pattern Classifier Shyamal Das(B)
and Abhinandan De
Indian Institute of Engineering Science and Technology, Shibpur 711103, West Bengal, India [email protected]
Abstract. Unintentional Islanding is one of the major problems in Microgrid. Detection of islanding events ensures reliable and safe operation of Microgrid. In this technical article, a Decision Tree (DT) based Islanding Detection Method (IDM) is proposed which is demonstrated on a microgrid based on IEEE 13 node distribution system incorporating Distributed Renewable Energy Sources (RESs) and Energy Storage System. The DT classifier is trained using voltage and frequency data of the microgrid under different transient events and loading conditions. A powerful feature extraction method namely Discrete Wavelet Transform (DWT) is used to extract useful features from the raw data. The DT classifier is compared with two different classifiers namely Support Vector Machine (SVM) and Artificial Neural Network (ANN). The results show that DT classifier based scheme gives accuracy of 100% with training and testing time of 9.0088 ms and 2.7592 ms respectively and hence can be used as a secure IDM. Keywords: Islanding detection · Microgrid · Renewable energy integration · Decision tree classifier · Wavelet transform
1 Introduction In modern power system, the continuous rise of energy demand and environmental issues introduce a new direction and challenges. Due to the integration of distributed renewable energy sources (RESs) and the use of communication network in power system, the conventional methods of power system control, operation and protection may face several problems [1]. Besides the conventional grid, there is a new concept of microgrid which is a decentralized group of electricity sources and loads [2]. Microgrid can be operated as connected or disconnected to the main grid. This flexibility of operation increases the security of power supply within a microgrid. A major problem in microgrid operation is unintentional islanding [3]. In this condition the local generators energize the microgrid loads in an unregulated way. As a result, the system can be affected severely as voltage and frequency may cross their specified limits. It can also create accidental electric shock to the workers if they touch any energized conductors assuming it to be disconnected from the main grid. Thus, the islanding condition should be detected and required actions should be taken promptly.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 123–132, 2022. https://doi.org/10.1007/978-981-19-3089-8_12
124
S. Das and A. De
1.1 Existing IDMs Islanding Detection Methods (IDM) are broadly classified into two types – Central (Remote) and Local (Residential). Remote methods need a communication infrastructure and hence it’s expensive and complicated to implement practically. Local methods are based on local information. Though these methods are easy to implement, the large Non-Detective Zone (NDZ) is a major problem. Local methods are further classified into Passive, Active and Hybrid methods. In passive method, system parameters like voltage, frequency etc. at the Point of Common Coupling (PCC) are monitored. In active method, disturbances are intentionally injected into the system and islanding events are detected from the system responses. Hybrid method uses both passive and active method together. Passive methods with artificial intelligence (AI) is a recent research trend as it’s fast and gives more accurate result as compared to conventional techniques. Matic-Cuka et al. [4] proposed an SVM based islanding detection technique. Autoregressive signal modelling is used to extract features from voltage and current signals at the PCC. Discrete fractional Fourier transform is used for feature extraction by Dutta et al. [5], whereas Buduma et al. [6] have used wavelet transform. Islanding detection in the presence of renewable energy sources using current variations and Stockwell Transform is presented by Mahela et al. [7]. Combination of signal processing tools and AI algorithm can be used to develop a powerful islanding detection technique [8]. In a complicated system such as a microgrid in presence of both conventional and RESs and Energy Storage System, the performance of the ID technique is to be analysed. 1.2 Contribution to the Work The research work in this article aims to detect the islanding event in a microgrid. The contributions of the paper are detailed below. 1. An islanding detection system is designed by combining passive IDM, AI and DWT base signal processing. 2. The system is tested in IEEE 13 node test feeder. To evaluate robustness of the system, RESs and energy storage systems besides conventional diesel generators are connected to the grid. 3. The experiment is performed under several transient condition such as Line-LineLine-Ground (LLLG) fault and large load switching under several loading conditions to make the system capable of classifying islanding events from other non-islanding transient events. Unavailability of RESs during the islanding event is also considered. 4. Performance of DT, SVM and ANN based classifiers are compared and analysed to find out the best suitable classifier.
2 Discrete Wavelet Transform For studying stationary data, Fourier analysis is widely used. It analyses a signal in frequency domain. But it’s not a good choice to analyze non-stationary data with transient
Islanding Detection in Microgrid Using Decision Tree Pattern Classifier
125
events. Wavelet Transform (WT) is preferred in such cases [9]. WT decomposes a signal into different frequency components with a resolution inversely proportional to time resolution. To detect discontinuities and sharp spikes in a signal, it gives better result compared to traditional Fourier methods [10]. WT can be used to extract information from different kind of data. Wavelet transformation is of two broad types - continuous wavelet transform and discrete wavelet transform. The Continuous Wavelet Transform (CWT) is defined in Eq. (1) as 1 +∞ ∗ t−τ ∫ f (t)ψ dt (1) C(τ, s) = √ s |s| −∞ The Discrete Wavelet Transform (DWT) is defined in Eq. (2) as 1 p−1 tm − a D[a, b] = √ f [tm ]ψ m=0 b b
(2)
3 Decision Tree Classifier Decision Tree is categorized under supervised learning technique which is used to solve classification and regression problems. Decision Tree Algorithm is inspired by the thinking process of human being while making a decision considering a set of possibilities. It is a tree like hierarchical model where decision is made by sequence of recursive splits. The process starts with the complete dataset which is represented as root node. The root node is splitted to generate two decision nodes. The splitting criteria is set using the best attribute/feature which is achieved from the Attribute Selection Measure (ASM). There are two most used techniques for ASM which are Information Gain and Gini Index [11]. Information Gain (IG) is measurement of information available in a node about the classes. It’s defined in Eq. (3) as – Information Gain = 1 − Entropy Entropy is the degree of randomness and defined in Eq. (4) as – n −Pi log2 Pi Entropy = i=1
(3)
(4)
pi denotes the probability of possible outcomes. Entropy is reduced from the root node to the leaf nodes and IG is increased. Gini Index or Gini Impurity is the probability of a randomly chosen feature being wrongly classified. The value of Gini index ranges from 0 to 1. The value of Gini Index increases with the randomness of distribution of the elements across various classes. Gini Index is defined in Eq. (5) as – n Gini = 1 − (5) (Pi )2 i=1
Pi denotes the probability of an element to be classified to a particular class. The decision nodes are further splitted recursively until all the data in a node belong to a single class. These final nodes are called leaf nodes and represents a particular class.
126
S. Das and A. De
4 Proposed DT Based Islanding Detection Scheme In an unintentional islanding event, the microgrid is accidentally disconnected from the main grid. Due to the mismatch of active and reactive power as shown in Eq. (6) and Eq. (7), operating point of system parameters changes to a new one. Thus a transition in some system parameters like voltage and frequency can be observed. P = PDG − PL
(6)
Q = QDG − QL
(7)
PDG - active power generated PL - active power consumption by the local loads QDG - reactive power generated QL - reactive power consumption by the local loads
Dynamic Model of Microgrid
Simulation of Microgrid under Islanding and other Non-Islanding transient events
Selection of parameter and generation of signal pattern
Feature extraction and relevant attribute selection
Training data and Testing data separation
Testing data
Training data Classifier Training
Testing of Trained Model
Dynamic Model of Microgrid
Fig. 1. Flow chart of proposed scheme
The magnitude of voltage and frequency at PCC fluctuates due to reactive and active power mismatch respectively. Useful features can be derived from these voltage and frequency signal which can be then used to train an intelligent classifier. Details of the scheme is presented as a flow chart in Fig. 1.
Islanding Detection in Microgrid Using Decision Tree Pattern Classifier
127
5 Illustrative Example of Implementation of the Proposed Scheme The proposed scheme is implemented in modified IEEE 13 node test system [12]. The steps include modelling of the Microgrid in MATLAB/SIMULINK, simulation of the system, collection of data, feature extraction and finally training and testing of classifier model. 5.1 Modelling of Microgrid The Microgrid is modelled using IEEE 13 node test system. The IEEE 13 node feeder is a small system that is used to test distribution systems. It operates at 4.16 kV, has 1 source, 1 regulator transformer, a number of short unbalanced transmission lines, and shunt capacitors. The system is modelled in MATLAB/SIMULINK. The single-line diagram of the test system is shown in Fig. 2.
Fig. 2. Single line diagram of modified IEEE 13-node test feeder
To make the system capable to operate independently after islanding event, some RESs, Diesel Generator and Energy Storage are connected. The specifications of these generating systems are given in Table 1. 5.2 Data Collection To develop an effective islanding detection system, a large set of cases have been considered including some Non-Islanding transient cases like 3-phase to ground fault and large load switching. The set of 60 islanding cases is generated for different combinations of the active and reactive power mismatches. At node 632, an adjustable RLC load
128
S. Das and A. De Table 1. Specifications of distributed energy sources
Source Type
Rated Capacity
Connected Node No.
PV Generation
1 MW
680
Wind Generation
1.5 MW
675
Diesel Generator
3 MW
634
Energy Storage System
0.1 MW
671
is connected to simulate a wide range of cases of active and reactive power mismatch and islanding event is simulated by opening the circuit breaker connected before node 632. The generated dataset is summarized in Table 2. Table 2. Dataset summary Cases
Number of data
Description
Islanding
21
Islanding with PV and Wind Generator
Islanding
21
Islanding with only PV Generator
Islanding
18
Islanding with only Wind Generator
Non-Islanding
18
3-Phase Fault
Non-Islanding
22
Load Switching
5.3 Feature Extraction Using DWT Before feeding the collected data into the classifier, it’s important to extract useful features from them. Here, we have used DWT using Daubechie-4 (db4) wavelet at level 8 decomposition. The detailed co-efficient are plotted in Fig. 3 and Fig. 4 respectively. The DWT coefficients captures different features of the main signal at different decomposition level. The energy level, standard deviation, maximum value and minimum value of detailed coefficients at each level is calculated and shown in Fig. 5. 5.4 Classifier Training Decision Tree Classifier is used to classify the events and to detect the islanding event. From the total dataset, 70% and 30% was selected randomly as training data and testing data respectively. The trained tree is shown in Fig. 6. 5.5 Result and Analysis After the model is trained, it’s tested using the testing dataset. The result is presented as a confusion matrix in Fig. 7. It shows the output of the classifier for several test cases against actual classes.
129
D8
D7
D6
D5
D4
D3
D2
D1
S
Islanding Detection in Microgrid Using Decision Tree Pattern Classifier
Fig. 3. Detailed coefficients of 8 level decomposition of voltage signal
The classification accuracy is calculated in Eq. (8) as below. Accuracy (in % ) =
number of correct classification × 100% total number of data classified
(8)
D8
D7
D6
D5
D4
D3
D2
D1
S
In the confusion matrix it is shown that all the 36 classification during testing is done correctly which implies the accuracy of the system is 100%.
Fig. 4. Detailed Coefficients of 8 level decomposition of Frequency Signal
5.6 Comparison with Other Methods The same dataset is also used to train and test two other classifiers namely Support Vector Machine (SVM) [13] and Artificial Neural Network (ANN) [14]. The training and testing time of the classifiers as well as the accuracy of each classifier is shown
130
S. Das and A. De 0.06
1
0.4 10
0
0.02
2
3
4
5
6
7
1
8
5
0
0 1
Standard Deviation
0.04
Energy
Standard Deviation
Energy
0.3
0.5
Decomposition Level
2
3
4
5
6
7
2
3
4
5
6
7
1
8
Decomposition Level
Decomposition Level
1
0.1
0 1
8
0.2
2
0
2
3
4
5
6
7
8
7
8
Decomposition Level 0
1.5
-0.5
1
2
3
4
5
6
Decomposition Level
7
8
1
-2
0.5
-3
0
-1
0
MInimum
Maximum
MInimum
Maximum
-1
0.5
1
2
3
4
5
6
Decomposition Level
7
8
1
2
3
4
5
6
7
8
Decomposition Level
Fig. 5. Extracted Features of Voltage and Frequency
Fig. 6. Trained Decision Tree Model
Fig. 7. Confusion Matrix
1
2
3
4
5
6
Decomposition Level
Islanding Detection in Microgrid Using Decision Tree Pattern Classifier
131
in Table 3 and Fig. 8. It can be seen that training and testing time for DT Classifier is minimum as compared to SVM and ANN whereas the accuracy is maximum for DT.
167.37
180
Decision Tree (DT)
160
Support Vector Machine (SVM)
140
Artificial Neural Network (ANN)
120
100
100
91.67 80.56
80 60
51.22 36 36 36
40
36 33 29 9.01
20
2.76
8.50 8.57
0 Number of Test Sample
Number of Correct Classification
Accuraccy (%)
Training Time Testing Time (ms) (ms)
Fig. 8. Comparative performance of the classifiers Table 3. Comparison table Classifier name
Training time (Sec)
Testing time (Sec)
Accuracy (%)
Decision Tree
0.0090088
0.0027592
100.0
Support Vector Machine
0.0512160
0.0085014
91.67
Artificial Neural Network
0.1673702
0.0085658
80.56
6 Conclusion A Microgrid is designed using IEEE 13 node test system. To make the system capable of islanded operation, distributed RESs are added. An intelligent islanding detection system is designed using DT classifier. To check the performance of the classifier several islanding and non-islanding transient events are simulated and voltage and frequency signals are collected. Using DWT useful features are extracted. The DT classifier gives the classification accuracy of 100%. SVM and ANN classifiers are also trained and tested which gives accuracy of 91.67% and 80.56% respectively.
References 1. Saim, A., Kouba, N., Amrane, Y., Lamari, M., Sadoudi, S.: Impact of renewable energies integration in interconnected power system: transmission-distribution. In: Algerian Large Electrical Network Conference, pp. 1–5 (2019)
132
S. Das and A. De
2. Zhou, Y., Ngai-Man Ho, C.: A review on Microgrid architectures and control methods. In: 8th IEEE International Power Electronics and Motion Control Conference, pp. 3149–3156 (2016) 3. Gaurav, S., Agnihotri, P.: Active islanding detection with parallel inverters in Microgrid. In: 9th IEEE International Conference on Power Systems, pp. 1–6 (2021) 4. Matic-Cuka, B., Kezunovic, M.: Islanding detection for inverter-based distributed generation using support vector machine method. IEEE Trans. Smart Grid 5(6), 2676–2686 (2014) 5. Dutta, S., Olla, S., Sadhu, P.K.: A secured, reliable and accurate unplanned island detection method in a renewable energy based microgrid. Eng. Sci. Technol. Int. J. 24(5), 1102–1115 (2021) 6. Buduma, P., Pinto, P.J., Panda, G.: Wavelet based islanding detection in a three-phase grid collaborative inverter system using FPGA platform. In: 8th IEEE India International Conference on Power Electronics, pp. 1–6 (2018) 7. Mahela, O.P., Sharma, Y., Ali, S., Khan, B., Padmanaban, S.: Estimation of islanding events in utility distribution grid with renewable energy using current variations and stockwell transform. IEEE Access 9, 69798–69813 (2021) 8. Mishra, M., Chandak, S., Rout, P.K.: Taxonomy of islanding detection techniques for distributed generation in Microgrid. Renewable Energy Focus 31, 9–30 (2019) 9. Jana, S., De, A.: Transmission line fault detection and classification using wavelet analysis. In: Annual IEEE India Conference, pp. 1–6 (2013) 10. Kulkarni, J.S.: Wavelet transform applications. In: 3rd International Conference on Electronics Computer Technology, pp. 11–17 (2011) 11. Jain, V., Phophalia, A., Bhatt, J.S.: Investigation of a joint splitting criteria for decision tree classifier use of information gain and gini index. In: Tencon IEEE Region 10 Conference, pp. 2187–2192 (2018) 12. IEEE PES Test Feeder: https://cmte.ieee.org/pes-testfeeders/resources/ 13. Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intel. Sys. their Appl. 13(4), 18–28 (1998) 14. Mishra, M., Srivastava, M.: A view of artificial neural network. In: International Conference on Advances in Engineering & Technology Research, pp. 1–3 (2014)
Identification of Lung Cancer Nodules from CT Images Using 2D Convolutional Neural Networks Sutrisna Anjoy1 , Paramita De2(B) , and Sekhar Mandal1 1
2
Computer Science and Technology, IIEST, Shibpur 711103, India [email protected], [email protected] Information Technology, G.L. Bajaj Institute of Technology and Management, Greater Noida 201306, India [email protected]
Abstract. Detection of malignant nodules at early stages from computed tomography images is time-consuming and challenging for radiologists. An alternative approach is to introduce computer-aided-diagnosis systems. Recently, deep learning approaches have outperformed other classification methods. In this paper, we use 2D convolutional neural networks to detect malignant nodules from CT scan images. We use modified VGG16 for the identification of lung cancer. LUNA 16 dataset is used to train and evaluate the proposed method, and experimental results show encouraging identification performance of the proposed method. We also compare the performance of the proposed method with the existing 2D CNN methods. Keywords: Convolutional Neural Network (CNN) · Lung nodule · Computer-Aided-Diagnosis (CAD) · Computed Tomography (CT) image
1
Introduction
At present, lung cancer is considered the primary cause of death worldwide. In 2018, 142670 people died in USA [6]. According to the article [7], about 228,820 new lung cancer cases were detected in 2020, and about 135720 people were died. The early detection of the disease can only reduce the number of deaths. Lung cancer is a disease of uncontrollable growth of abnormal lung cells. If we detect the malignant nodules in stage 1 then the survival rate is 83%–92% which is relatively high, whereas the survival rate is tremendously low at stage 4 (10%–1%). The nodule is about 30 mm or less in size in stage 1; in stage 2, the nodule size is around 50 mm. CT scan is the most effective method to detect lung cancer nodules at early stages due to its high-resolution (3D) chest images. Manual detection of malignant lung nodules in early stage from the CT scan image is a difficult task for radiologists. Hence, the alternative approach is CAD systems. c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 133–140, 2022. https://doi.org/10.1007/978-981-19-3089-8_13
134
S. Anjoy et al.
An enhanced multidimensional region-based fully convolutional networkbased system is proposed in [2] for the detection and identification of lung nodules. The region of interest is selected using a median intensity projection to leverage 3D information from CT images and the deconvolutional layer. The sensitivity of the system is 98.1%. A linear discriminant analysis-based CAD system is proposed in [1], resulting in the sensitivity of 70% applied on 187 nodules. A CAD system proposed in [8] uses deep features extracted from an auto encoder to classify lung nodules. Here, the LIDC dataset is used, and nodules are extracted from the 2D CT images using the annotations provided in the dataset. A 200-dimensional feature vector represents a nodule using a fivelayered de-noising auto encoder. This vector is fed into a binary decision tree for the nodule classification, and the method’s sensitivity is 83.25%. Shakeel et al. [3] propose a CAD system for lung cancer detection. After noise removal, the nodules are segmented from the CT scan images using deep neural network and features are also extracted. The effective features are selected with the help of hybrid spiral optimization intelligent-generalized rough set approach and ensemble classifier is used for classification. A lung cancer-detecting method is proposed in [4]. After image preprocessing, nodules are segmented using the watershed segmentation method. Different features (like perimeter, eccentricity, mean intensity, diameter, etc.) are extracted from each nodule. Then SVM classifier is used to detect the malignant nodules. The accuracy of this system is 86.6%. Akter et al. [5] propose a Fuzzy-based image segmentation scheme for extraction of nodules from CT images. They use a neuro-fuzzy classifier to classify the lung nodules into malignant and benign classes, and the accuracy of their method is 90%. Several deep learning-based CAD systems are presented in [9] for nodule detection. These solutions achieved a sensitivity of over 95%. A Multi-crop CNN (MC-CNN) is proposed in [10] for lung nodules classification, and the accuracy of this system is 87.14%. A computer-aided decision support system for lung nodule detection based on a 3D Deep CNN is proposed in [11]. The median intensity projection and multi-Region Proposal Network select potential regionof-interests. The training and validations are done using the LUNA16 dataset. The system has the sensitivity and accuracy of 98.4% and 98.51%, respectively. Jiang et al. [12] propose lung nodule detection scheme based on multigroup patches cut out from the lung images. A four channel CNN model is designed for detecting nodules resulting in sensitivity 94%. In this paper, we propose a CAD system which helps radiologists for identifying malignant nodules in CT images. The lung nodules are segmented using annotations provided in the dataset. We use modified VGG16 2D CNN for classification of nodules into benign and malignant classes.
2
Proposed Method
There are various publicly available dataset, some of these are described as follows:
Identification of Lung Cancer Nodules from CT Images
135
– LUNA 16: This dataset was published in 2016 for a competition named “Lung Nodule Analysis 2016 grand challenge”. There are 888 CT scans available in 10 subsets (subset0 to subset9). In addition to the raw image, each CT scan has its meta-data in mhd format. The dataset has a csv file that contains the centroid for each nodule and its class label. The meta-data of each CT scan are: (i) NDims → dimension of the data, (ii) Offset → coordinate of the origin in the CT scan, (iii) Element Spacing → space between two corresponding coordinates and (iv) DimSize → 512 × 512 × 121. Every slice is of size 512 × 512 and the number of slices may vary from CT scan to CT scan. Only 0.24% candidate nodules are malignant in this dataset, except that others are benign nodules. – Kaggle Data science bowl 2017 data (DSB3) - This dataset contains 1397 CT scans, and there is a csv file that contains CT scan ID and its class label. CT scans are available in DICOM format. – LIDC-IDRI - This dataset includes 1018 CT scans, and the file format is in DICOM. Each CT scan includes an XML file containing annotations of the CT scan. 2.1
Nodule Segmentation
The nodules are segmented from CT images using the annotations present in the dataset. For each nodule in a CT image, the centroid of the nodule and the class label (malignant/benign) of the nodule are available in the csv file. To segment a nodule from a CT image, we use a window of size 50 × 50. The center of the window coincided with the centroid of the nodule and cropped the area within the window. The cropped image regions are used for the training phase of the deep neural network.
Fig. 1. An example CT image and its nodule regions.
136
S. Anjoy et al.
The standard CT scan image height and width is near 320 mm (depending on the age, height, and weight of the person lung size may vary), and according to our study, in the first stage, nodule size can be within the range of 3 mm to 30 mm, and in the second stage, it would be up to 50 mm. Hence, we have to crop a 30 mm area as perfectly as possible and try to avoid unnecessary information, and empirically, we select the size of the window. An example CT image and corresponding nodule regions are shown in Fig. 1. After nodule segmentation, we normalize the segmented regions as follows. A lung CT image has different substances such as bone, soft tissue, water, fat, lung, air. The different substance has a different range of HU values. For example, bone has HU values in the range of +400 to +1000, the air has HU −1000, and the lung has HU in the range of −400 to −600. We normalize this value within 0 to 1 using the following formula in Eq. 1. X=
X − HUmin HUmax − HUmin
(1)
if X ≥ 1 then, X = 1 and if X ≤ 0 then, X = 0. Finally, X = X × 255. 2.2
Network Architecture and Training Phase
We use a deep neural network to predict the class label of the nodules. The proposed model consist of 10 convolutional layers, 5 fully connected layers, and 1 output layer. So, the model has 16 weighted layers. The original VGG16 model has 16 weighted layers; hence we call the model a modified VGG16.
Fig. 2. Network architectures
In our case, the image size is 50 × 50 × 1, whereas, in the case of VGG16, the image size is 224 × 224 × 3. We use batch normalization to speed up the training process and remove minor distortion in the image to avoid overfitting.
Identification of Lung Cancer Nodules from CT Images
137
The LUNA 16 dataset has only 0.24% malignant candidate nodules, which may lead to overfitting of the network. We introduce dropouts to prevent overfitting. The architecture of the networks is summarized in Fig. 2. We use the LUNA16 dataset, and there are 549714 benign candidate nodules and 1351 malignant candidate nodules. Hence, the dataset is highly imbalanced, and to reduce the imbalance in the dataset, we select 5 × 1351 numbers of benign candidate nodules from the dataset randomly. Therefore, our new dataset contains 6,755 benign nodules and 1351 malignant nodules. We use 70% of the total samples for training, 20% for validation, and 10% for testing. Our dataset is small for training a deep neural network, so we augment the training data. We use Adam optimizer with 0.0005 learning rate, categorical cross-entropy loss function, no. of epochs = 200, and batch size = 64.
3
Experimental Results
In our experiment, we use a dataset that contains 6,755 benign nodules and 1351 malignant nodules. Above mention dataset is divided into three parts; the first part contains 70% of the data and the second and third parts have 20% and 10% respectively. The first part of the data is used in the training phase of the network. The trained model is validated using the second part of the data. Finally, the third part of the data is used for testing the model.
Fig. 3. The performance of the proposed model.
The performance of the proposed model in the training and validation phase is shown in Fig. 3. It is evident from the figure that the performance of the proposed method in the training and validation phase is almost same.
138
S. Anjoy et al.
A confusion matrix (see Fig. 4) is also computed to show the performance of the proposed model in the testing phase.
Fig. 4. The confusion matrix. Table 1. Performance analysis table. Class
Precision Recall F1 -score
Benign-nodules
0.98
0.99
0.98
Malignant-nodules 0.92
0.91
0.91
We also analyze the proposed model in terms of precision, recall, and f1 -score, and Table 1 shows the performance of the proposed model. Table 2. Comparison with existing methods. Method
Dataset
Accuracy Sensitivity
Hamdalla method [13] IQ-OTH/NCCD lung cancer dataset
93.45%
95.714%
Setio [14]
LIDC-IDRI
93.5%
90.1%
Xie method [15]
LUNA16
95.0%
86.42%
Proposed method
LUNA16
97%
91.0%
Identification of Lung Cancer Nodules from CT Images
139
We compare the performance of the proposed method with some of the existing methods (used 2D convolution), which is shown in Table 2. It is evident from Table 2 that the proposed method is comparable to the existing methods and in some cases, it outperforms.
4
Conclusion
A well-performing CAD system can help radiologists for detecting lung cancer. In this paper, we proposed a simple model for classifying lung nodules into benign and malignant classes. Our proposed method is trained and evaluated using LUNA16 datasets. We chose this dataset as it is very popular dataset and used by many researchers. The experimental results demonstrate that the proposed model achieves the sensitivity of 91% and classification accuracy of 97%.
References 1. Armato, S.G., III., Giger, M.L., Moran, C.J., Blackburn, J.T., Doi, K., MacMahon, H.: Computerized detection of pulmonary nodules on CT scans, 19(5), 1303–1311 (1999). https://doi.org/10.1148/radiographics.19.5.g99se181303 2. Masood, A., et al.: Automated decision support system for lung cancer detection and classification via enhanced RFCN With multilayer fusion RPN. IEEE Trans. Industr. Inf. 16(12), 7791–7801 (2020). https://doi.org/10.1109/TII.2020.2972918 3. Shakeel, P.M., Burhanuddin, M.A., Desa, M.I.: Automatic lung cancer detection from CT image using improved deep neural network and ensemble classifier. Neural Comput. Appl. (2020). https://doi.org/10.1007/s00521-020-04842-6 4. Makaju, S., Prasad, P.W.C., Alsadoon, A., Singh, A.K., Elchouemi, A.: Lung cancer detection using CT scan images. Procedia Comput. Sci. 125, 107–114 (2018). https://doi.org/10.1016/j.procs.2017.12.016 5. Akter, O., Moni, M.A., Islam, M.M., Quinn, J.M.W., Kamal, A.H.M.: Lung cancer detection using enhanced segmentation accuracy. Appl. Intell. 51(6), 3391–3404 (2020). https://doi.org/10.1007/s10489-020-02046-y 6. Siegel, R.L., Miller, K.D., Jemal, A.: Cancer statistics. CA Cancer J. Clin. 69(1), 7–34 (2019). https://doi.org/10.3322/caac.21551 7. Cancer facts and figures 2020. Atlanta: American cancer society. https://www. cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-factsfigures-2020.html. Accessed 02 May 2020 8. Kumar, D., Wong, A., Clausi, D.A.: Lung nodule classification using deep features in CT images. In: Proceedings of the 2015 12th Conference on Computer and Robot Vision, pp. 133–138 (2015). https://doi.org/10.1109/CRV.2015.25 9. Setio, A.A.A., et al.: Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Med. Image Anal. 42, 1–13 (2017). https://doi.org/10.1016/j. media.2017.06.015 10. Shen, W., et al.: Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification. Pattern Recogn. 61, 663–673 (2017). https:// doi.org/10.1016/j.patcog.2016.05.029
140
S. Anjoy et al.
11. Masood, A., et al.: Cloud-based automated clinical decision support system for detection and diagnosis of lung cancer in chest CT. IEEE J. Transl. Eng. Health Med. 8 (2020). https://doi.org/10.1109/JTEHM.2019.2955458. Art. no. 4300113 12. Jiang, H., Ma, H., Qian, W., Gao, M., Li, Y.: An automatic detection system of lung nodule based on multigroup patch-based deep learning network. IEEE J. Biomed. Health Inform. 22(4), 227–1237 (2018). https://doi.org/10.1109/JBHI. 2017.2725903 13. Al-Huseiny, H.F., Mohsen, M., Khalil, F., Zainab, E.H.: Diagnosis of lung cancer based on CT scans using CNN. In: Proceedings of the Conference Series: Materials Science and Engineering, pp. 022–035 (2020). https://doi.org/10.1088/1757-899X/ 928/2/022035 14. Setio, A.A.A., et al.: Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans. Med. Imaging 35(5), 1160–1169 (2016). https://doi.org/10.1109/TMI.2016.2536809 15. Xie, H., et al.: Automated pulmonary nodule detection in CT images using deep convolutional neural networks. Pattern Recogn. 85, 109–119 (2019). https://doi. org/10.1016/j.patcog.2018.07.031
A Pixel Dependent Adaptive Gamma Correction Based Image Enhancement Technique Satyajit Panigrahi, Abhinandan Roul(B) , and Rajashree Dash Department of Computer Science and Engineering, ITER Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha 751030, India [email protected], [email protected], [email protected]
Abstract. It is seen that real world photography produces inaccurate colours when displayed on any digital screen. Most computer systems have gamma correction algorithms to increase colour accuracy, which have a number of drawbacks. This paper aims to formulate a novel approach to contrast correct through the use of indigenous pixel values of each individual channel. Allowing the gamma correction algorithm to have a larger pixel dependant intercept aids in evenly balancing contrast in relatively dark (low contrast) and comparatively bright (high contrast) portions of the subject picture. Comparative studies on Low Dynamic Range (LDR) pictures have been done to show the difference in outcomes obtained using the suggested technique, the Pixel Adaptive Gamma Correction (PAGC) methodology. With our suggested strategy, we gained absolute supremacy in the entropy score as well as the colourfulness measure over standard gamma correction and histogram equalisation contrast-adjustment techniques. Keywords: Gamma correction · Computer vision · LDR · Contrast enhancement
1 Introduction Image enhancement is a widely used method for improving the quality of medical and natural images. Its main purpose is to make an image more visually appealing by improving certain characteristics. Enhancing images helps viewers understand images better and enables further image analysis. In addition, enhanced images are often needed as input images for many image processing systems. The concept of enhancement encompasses various aspects of image enhancement, such as boosting saturation, sharpening, denoising, adjusting tones, improving tonal balances, and enhancing contrast. This paper is mainly concerned with enhancing contrast in different types of images. Various approaches have been proposed to enhance an image’s contrast. Gamma correction (GC) is a widely used method for correcting images’ contrast in a non-linear way. GC takes into account the fact that the Human Visual System (HVS) perceives and visualizes light and colour in a non-linear manner i.e. brighter parts of images are viewed with a different intensity which is not proportional to the intensity of darker regions. The role of brightness comes into play here as gamma correction affects © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 141–150, 2022. https://doi.org/10.1007/978-981-19-3089-8_14
142
S. Panigrahi et al.
the overall global brightness of the image. Consequently it also influences the contrast of the image with the most prominent changes visible in the brighter and darker regions. Also referred to as gamma compression or gamma encoding, it was originally introduced in order to adjust the input-output pixel definitions of cathode ray tube displays. According to studies, human eyes visualise the world around in terms of light reflected from an object(s) and its relative brightness in its surrounding. It can be estimated through a power function with a higher perceptive sensitiveness towards less brighter regions than its counterparts. It is accepted to be consistent with the general form of Stevens Power law [1]. GC is used to overcome the mathematical rigidity of allocating too restrictive and point precise pixel values which end up generating a less flexible curve according to the HVS which often we cannot differentiate or realize without targeted observation [2]. The conventional GC approach may be attributed to the power law which is as follows, Vout = AVinα
(1)
where Vin is the input channel, Vout is the output channel, A is the constant and α is the power to which the input is raised in Eq. (1) [1]. The non-linear compression of pixel values when α < 1 is called as gamma compression while the counter α > 1 i.e. non-linear expansion when is called as gamma expansion. The proposed approach derives its root ideology from the expression of power law that exponential transformation is one of the earliest and most robust methods of ensuring secure and exhibited pixel transformation and correction. Preserving the visual aspect of the pixel value while at the same time making hairline modification to its values so as to achieve a desirable effect from a cluster of pixel is realized through the idea of making a pixel value go through exponential gamma encoding. The exponent is decided through the steps of channel decomposition and deciding the α for individual pixel. A paper published in 2016, with the goal of procuring visually aesthetic image by making dynamic adjustments to lightness and brightness of the image, approached the process of adaptive GC [3]. The Adaptive GC methodology processed color RGB images according to the value spectrum of Hue Saturation Value (HSV) format. It used a dual processing of logarithmic function and exponential function to set the value. It followed the traditional GC function mentioned in Eq. 1. After defining, the value of K was decided and the processed image was converted back into RGB. Along the lines, the model considers various filters and image classifiers to determine the dynamically adjusted and K values. It classifies images initially based on contrast and the results were further branched out into low contrast images having dark and bright regions while high contrast images having their respective dark and bright regions. By combining contrast limited adaptive histogram equalization with local image contrast, a hybrid contrast enhancement approach is proposed in [4]. For any given pixel position in the image, local contrast enhancement is determined by the corresponding local gain parameter, which is calculated by taking into account the edge density of the current pixel neighborhood. To improve contrast while preserving natural color and visual details, an Adaptive GC with Weighted Histogram Distribution method is presented in [5]. The adaptive GC method is utilized in order to improve the contrast, whereas the later one is used to preserve natural colors and details of the image. An illumination-based GC is proposed in [6]. In order to find the illumination component, it has used the maximum a posteriori estimator. In order to suppress the unevenness in the
A Pixel Dependent Adaptive Gamma Correction Based Image Enhancement
143
estimated illumination component, the Gamma correction is applied after normalization. For low-light images, a new image fusion method combining complementary gamma functions with a new sharpening technique is proposed in [7] as a way to enhance the visibility of dark regions, while simultaneously achieving high contrasts in bright areas. Although these approaches provide an improvement in contrast, the brightness is overenhanced and important details may lost, resulting in undesirable images. As a solution to this, a method is presented in this paper that enhances contrast and preserves maximum information in an image. In the proposed Pixel dependent Adaptive Gamma Correction (PAGC) approach, the objective is to generate a transformation function that is less inclined towards a constant exponent assigned through external interference and rather the justification of the transformation exponent was kept dependent on the pixel it was transforming. This intuition was developed in order to overcome the fact that the conventional GC technique attributed the exponent to a constant value which led to restriction in the extent. Each pixel does not require equal magnitude of transformation rather proportional transformation. The criteria for verification included former metrics of colorfulness score, contrast score and entropy score for RGB images over a dataset of 100 images procured from ADE20K [8] dataset.
2 Proposed PAGC Approach Dynamic Imagery usually consists of a three layered structure of three distinctive image channels Red, Blue and Green. Looking at a mathematical interpretation, we can consider each of the image channel to be a 2-dimensional matrix of discrete values which may be 8 bit integer or 32 bit float depending on whether the image is a LDR or High Dynamic Range (HDR) respectively. The aim is to manipulate these image channels depending on PAGC corrective function. Individually viewed, the channels appear to be a grayscale representation of the image but they should not be mistaken to be similar to grayscale format as the individual pixel values may vary. The three spectrums layered together in RGB or BGR format gives us the final colour image. On paying close attention to the workflow demonstrated in Fig. 1, instead of the simplistic method of using a lookup table to force transform each pixel in the 6-dimensional image matrix (three 2-dimensional colour channels layered one over another). Each channel was considered separately and each pixel was corrected according to the dynamic exponent assigned through their own channel values as demonstrated through Eq. (2), Eq. (3) and Eq. (4). If there is a [256 × 256] size image, all the channels will be of [256 × 256] size i.e.65536 pixels. At first the maximum and minimum values of each image channel to be used in PAGC function is determined in the follow up. rmax = max(r1, r2, r3, . . . . , ri), rmin = min(r1, r2, r3, . . . ., ri)
(2)
where (r1, r2, r3 . . . .ri) are the pixels belonging to red channel. bmax = max(b1, b2, b3, . . . ., bi), bmin = min(b1, b2, b3, . . . ., bi)
(3)
where (b1, b2, b3, . . . ., bi) are the pixels belonging to blue channel. gmax = max(g1, g2, g3, . . . . , gi), gmin = min(g1, g2, g3, . . . ., gi)
(4)
144
S. Panigrahi et al.
where (g1, g2, g3, . . . ., gi) are the pixels belonging to green channel. Looking at each pixel pi in each layer sequentially and apply the non-linear correction function of PAGC, following ahead in Eq. (5). Respecting the foundational idea of implementing the power law in gamma correction, it has been kept in mind to consider the same power law but with a standardising methodology in order to obtain a more definitive curve when pixel values approach the relatively high-contrast boundary i.e. 255. The main idea behind such directive was to impart proportionate estimate of weighted pixel values instead of equally weighing all pixel values by a constant exponent and measure. p pmax −pi pmax −pmin i ∗ 255 (5) Pnew = 255
Fig. 1. Flowchart illustrating the proposed PAGC approach
STEP- 1: A RGB image is considered whose contrast needs to be adjusted. The images is resized to 512 × 512 pixels so as to reduce experimental computational cost. The dimension of the image is not included in the factors influencing gamma correction. STEP- 2: The stacked RGB image is decomposed into a single layer Grayscale image so as to apply arithmetic transformation on a 2D matrix of pixels values and obtain the transformation curve for analytic purposes. STEP- 3: Each color channel i.e. Red, Green, Blue is operated on separately and individual pixel is considered by the adaptive transformation function. This leads to noninterference of transformation exponent amongst the individual channels. Also this aims to achieve a more localized approach to gamma correction. STEP- 4: After separating the channels, we consider the maximum and minimum pixel value in each color matrix (which are two-dimensional in nature) and according to our transformation function the new pixel value is calculated as per Eq. 5 of each individual pixel of each channel. STEP- 5: After each channel has undergone the transformation procedure and the pixel values have been reconstituted according to the dependent constraints and exponent, the channels are merged back in the original order to form the image.
A Pixel Dependent Adaptive Gamma Correction Based Image Enhancement
145
3 Results and Analysis We evaluated our proposed approach over a dataset of 100 different images of varying dimensions and contrast features from the ADE20K [8] datasets. The proceedings from the evaluation considering relativistic as well as absolute measures were in sync with positive outcomes expected from the proposed method over conventional gamma correction.
Fig. 2. Graph depicting the transformation curve of conventional gamma correction.
Fig. 3. Graph depicting the transformation curve of PAGC.
Figures 2 and 3 depict the Gamma function in horizontal-layout comparison between conventional gamma correction and PAGC with the x-axis indicating input pixels and y-axis indicating output pixels. From the simple graphs pertaining to each gamma correction function we may notice that in conventional GC the graphs remains almost unchanged for higher values of pixels (towards the visually white pixels) while in the proposed method, the graph follows a more no-linear structure a flattening out for the higher range of pixels in order to maintain a contrast coherence between bright and dark regions. The proposed gamma correction offers a more flexible weight measure to pixel values. The Figs. 4, 5, and 6 show the RGB channels of the input image over a pixel value range of [0,255]. The x-axis indicates the pixel values and the y- axis indicates the pixel count of each channel. The three channels are plotted on a single graph for easier inference. The original image RGB histogram offers sparsely distinctive pixel distribution which can be fairly judged to have an uneven distribution over the range of number of pixels. The conventional gamma correction makes an attempt at normalising pixel values but is restricted to its constant exponent following the power law. It provides insightful clarity on the effectiveness of our standardising exponent where we can notice in the graph above the RGB histogram flattens out keeping the number of pixels approximately similar. From the above three graphs it is evident that for an approximately similar pixel count ( 2000–3000) the PAGC validates more coherence between the RGB spectrum and pixel distribution pertaining to an even slope for all pixel values. Further the performance of PAGC is evaluated based on its entropy, colorfulness and contrast [9, 10] score. Table 1 represents the three scores related to original image, GC corrected image and PAGC corrected image.
146
S. Panigrahi et al.
Fig. 4. Histogram of distribution of RGB pixels in the original image
Fig. 5. Histogram of distribution of RGB pixels in the gamma correction
Fig. 6. Histogram of distribution of RGB pixels in the PAGC approach
3.1 Entropy Entropy can be inferred from the idea of how much a picture can deliver information through its pixel values or the extent of vividity and information contained in the image [9]. It is calculated as per the following Eq. (6) [5]: p(i)logp(i) (6) Entropy = − i
where i lies in-between 0 to 255 and p(i) = n(i)/N is the probability of intensity occurrence of N pixels. Entropy defines the randomness of information in an image and greater entropy signifies a more even spread of pixel values over a given range as is evident from Fig. 6. The proposed PAGC approach achieved an entropy score of 7.177545378686857 far surpassing its predecessor the conventional gamma correction. It is evident that the entropy of PAGC approach is comparatively much higher than the conventional gamma corrected but at there exists a miniscule elevation in the entropy of the original image which hands out a slightly more entropy factor in comparison to our proposed method.
A Pixel Dependent Adaptive Gamma Correction Based Image Enhancement
147
3.2 Colorfulness Colorfulness denotes the ironic color contents in an image. A larger value of colorfulness indicates the higher color, which is calculated according to the following Eq. (7) [5] as, (7) Colorfulness = σryb + 0.3 ∗ μryb where standard deviation and mean of pixel coefficients are calculated in the following Eq. (8) [10] and Eq. (9) [10], 2 + σ2 σrgyb = σrg (8) yb and μrgyb = μ2rg + μ2yb (9) The colorfulness metric is based on the standard deviation of the intensity-wise colour differences according to Eq. (10) [5] rg = (R − G) and yb = 0.5 ∗ (R + G) − B
(10)
Colorfulness is a relative measure of how aesthetic an image can be depending on lighting, background, foreground and other transitive factors in good quality images [10]. The average score for 100 images lies on the narrower side of score with 161.63858277231412 being lowest amongst both original image as well as gamma corrected image. Colorfulness is more concerned with aesthetic appearance of the subject and is often inconsequential to the actual measure of how good an image is. Nonetheless there is evidently a minor decrease in the colorfulness score in our proposed model which may primarily be a result of the even distribution of RBG histogram. 3.3 Contrast The primary objective of contrast adjustment through dynamic adaptation of the exponent in power function according to Eq. (1) being directly dependent on individual pixel values has been thoroughly achieved through our proposed PAGC approach. It is evident that the contrast score of 64.587 we achieved towers over original image and gamma correction score. Contrast is a definitive aspect of image evaluation and it is visibly noticeable that the primary purpose of mediating contrast has been well served in the proposed approach. The contrast score of PAGC substantially overtakes the contrast score of both original image as well as conventional gamma corrected image. To bring a more generic comparison an attempt was made to create a histogram equalized (HE) transformation. Figures 7, 8, 9 and 10 represent the original image, its HE transformation, GC transformation and PAGC transformation in gray scale respectively. The motive behind this was to level the field by eliminating colours and comparing the gray scale transformation of the original image (BASE), conventionally gamma corrected one (GC) and the proposed approach (PAGC). Further the PAGC and conventional GC transformation of a colored image is presented in Figs. 11, 12 and 13. The white sign on the road is more prominent in the PAGC model in contrast to the original image or conventional gamma correction. Focusing on the foliage and darker regions, it is observed that the PAGC does a much better job at mediating the contrast between the green and black features than the regular gamma correction function. Same goes for the doors and building shadows.
148
S. Panigrahi et al. Table 1. Performance evaluation of PAGC
Image
Entropy score
Colorfulness score
Contrast score
Original image
7.288
162.027
58.570
Gamma corrected image
7.140
163.053
51.786
Pixel adaptive gamma corrected image
7.178
161.639
64.587
Fig. 7. Original image
Fig. 8. Histogram equalized image
Fig. 9. GC image
Fig. 10. PAGC image
4 Conclusion The subsequent analysis of pixel histograms and comparative studies involving aesthetic properties along with numerical metrics cross-referencing i.e. Entropy Score, Colorfulness Score and Contrast Score establish the robustness and novelty of the highlighted approach which can be considered as the fulfilment of our primary objective. The proposed methodology has confirmed to produce elevated results over conventional gamma correction techniques. The significant advantage of this method is more liberty
A Pixel Dependent Adaptive Gamma Correction Based Image Enhancement
Fig. 11. Original image
Fig. 12. PAGC image
149
Fig. 13. GC image
towards calibration of correcting coefficient or transformation exponent leading to versatile experimental results. A more localised approach takes into account as less foreign coefficients and constants as possible. Despite its advantages, when working with high DPI images, there is a significant amount of computational overhead. The influence of neighbouring pixels is directly proportional to pixel density, i.e. a brighter region influences its boundary pixels. Further the transformation function may demand a more dynamic determining method for setting the exponent as well as a more concrete evaluation metrics in order to judge the novel approach in a more conclusive manner which begs the need for a suitable future probing into the techniques implemented for gamma correction.
References 1. Zwislocki, J.J.: Stevens’ Power Law. Sensory Neuroscience: Four Laws of Psychophysics, pp. 1–80 (2009) 2. Kumar, A., Jha, R.K., Nishchal, N.K.: An improved Gamma correction model for image dehazing in a multi-exposure fusion framework. J. Vis. Commun. Image Represent. 78, 103122 (2021) 3. Rahman, S., Rahman, M.M., Abdullah-Al-Wadud, M., Al-Quaderi, G.D., Shoyaib, M.: An adaptive gamma correction for image enhancement. EURASIP J. Image Video Process. (1), 1–13 (2016) 4. Lee, J., Pant, S.R., Lee, H.S.: An adaptive histogram equalization based local technique for contrast preserving image enhancement. Int. J. Fuzzy Log. Intell. Syst. 15(1), 35–44 (2015) 5. Veluchamy, M., Subramani, B.: Image contrast and color enhancement using adaptive gamma correction and histogram equalization. Optik 183, 329–337 (2019) 6. James, S.P., Chandy, D.A.: Devignetting fundus images via Bayesian estimation of illumination component and gamma correction. Biocybern. Biomed. Eng. 41(3), 1071–1092 (2021) 7. Li, C., Tang, S., Yan, J., Zhou, T.: Low-light image enhancement via pair of complementary gamma functions by fusion. IEEE Access 8, 169887–169896 (2020)
150
S. Panigrahi et al.
8. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 633–641 (2017) 9. Thum, C.: Measurement of the entropy of an image with application to image focusing. Opt. Acta: Int. J. Opt. 31(2), 203–211 (1984) 10. Hasler, D., Suesstrunk, S.E.: Measuring colorfulness in natural images. In: Human vision and electronic imaging, International Society for Optics and Photonics, vol. 5007, pp. 87–95 (2003)
Summarization of Comic Videos Tanushree Das(B) , Arpita Dutta, and Samit Biswas Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, India [email protected], {arpita dutta.rs2018,samit}@cs.iiests.ac.in Abstract. With the increasing amount of video data generated everyday, it has become important to summarize the videos for faster retrieval and quick surfing of videos so that users can select the more relevant video for viewing as per their requirement. The importance of video summarization lies in the fact that it helps in efficient storage and allows quick browsing through a large number of videos. We propose a method to generate summaries for videos by utilising the audio component. Our video summarization approach involves generating the audio transcript, using speech recognition, if it is not readily available. It is based on assigning scores to sentences in the transcripts and selecting the ones with highest scores. Then from the original video, segments corresponding to the selected sentences are extracted and merged to obtain the final summary.
Keywords: Comic video summarization Emotion analysis
1
· Speech recognition ·
Introduction
Video is a sequential and information-rich medium. Every day large amount of videos are regularly uploaded as well as downloaded from online video hosting, sharing and social media platforms like YouTube, Dailymotion, Vimeo etc. and entertainment sites like Netflix, Prime which have a huge user-base. With the availability of large amounts of video data, video summarization has become indispensable for quick browsing and storing of videos in a way that presents the central idea in a relatively shorter time. The main objectives behind video summarisation are - (1) It helps to convey the plot in a shorter time and to choose the video most relevant to their need. (2) Summarized form of video increases storage space utilization. (3) It reduces transmission time for videos browsed over the Internet, and allows quick surfing of a large number of videos and so on. Some of the applications of video summarization include generating the trailers of movies, genre - identification of videos, producing faster results by various video search engines, and generating sports and news highlights. Existing video summarization techniques focus primarily on the identification of keyframes and sub-shots. Such keyframe-based techniques leave out a c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 151–162, 2022. https://doi.org/10.1007/978-981-19-3089-8_15
152
T. Das et al.
major portion of the information content - the audio. However, aural information plays an important role in attracting human attention. As surveyed by Vivekraj et al. [12], two main categories of video summarization are static video summarization and dynamic video summarization. Haq et al. [2] reviewed and made a comparative analysis of video summarization techniques. They found that the performance of a summarization technique is domain specific and also highlighted the possibility of different problems faced in video summarization such as redundancy with respect to key-frame based techniques, high changes in content make clustering difficult with respect to feature-based techniques, training costs and use of optimization algorithms for deep learning models and other reasons as surveyed by Apostolidis et al. [17]. Comic, cartoon or animated videos are videos created from a series of very bright and colourful illustrations, which capture user attention more effectively than normal videos. They are generally used for the entertainment of children, but recently, their use in marketing and children’s educative purposes has also increased. Therefore, summarization of comic videos will also aid parental control where the parent can use the video summary to decide whether the video is relevant for their child or not. This motivates us to work with comic video summarization. However, the key challenges associated with comic video summarization are - (1) Comic videos tend to use more colloquial terms. (2) The repetitive use of many shots causes a great problem in capturing consecutive and similar key-frames [8]. To the best of our knowledge, we have first explored the problem of comic video summarization. The key difference between static and dynamic summaries is the presence of motion and/or audio information in the latter. Although static video summaries effectively present the visual contents of the videos, they cannot preserve the dynamic and temporal characteristics. Moreover, static video summaries discard the audio track, which is an important information channel. Another fundamental drawback of static video summaries is that it is hard to grasp for non-experts, especially if the plotline is complex. In this paper, we describe a new approach for video summarization that extracts the audio from the video and generates transcripts using speech recognition tools if the transcript or sub-title is not available. Then we perform text summarization on the extracted transcripts and retrieve the corresponding video segments from the original video to create the final video summary.
2
Related Work
Zhang et al. [1] proposed an unsupervised clustering method based on HSV color features for selection of representative key frames. Redundancy among video frames was reduced by utilizing cluster-validity analysis. Zhang et al. [20] used LSTM to select a keyframe. The issue with such keyframe based methods is that all frames of input video are treated with same importance. In addition, the audio information is not taken into account. Liu et al. [3] proposed using perceived motion energy (PME) to extract keyframes at the turning point of the acceleration and deceleration of motion. Elfeki et al. [7] empirically observed that a frame is more likely to be included in human-generated summaries if it
Summarization of Comic Videos
153
Start
Comic Video
Audio
Extract Audio
Segment the audio into smaller manageable chunks and save the timestamps
Use speech recognition tools on each chunk to generate transcript
No
Is subtitle / transcript available?
Yes
Transcript Text File
Use Extractive Text Summarization to generate summary
Extract respective video segments from the original comic video using timestamp for each sentence in the summary
Combine the video segments
Comic Video Summary
Stop
Fig. 1. Flow-chart for depicting proposed approach of video summarization
contains a substantial amount of deliberate motion performed by an agent, which is referred to as actionness. Activity recognition based on motion analysis is a popular summarization technique which is better suited to egocentric videos as in the case of Poleg et al. [18], who used compact 3D CNN for long term activity recognition. Ryoo and Matthies [19] used global and local motion features for recognizing interaction-level activities. Ma et al. [4] highlighted that attention is a neurobiological conception and that information elements like motion and speech capture human attention better. They proposed a complete user attention model comprising of an integrated set of visual, audio, and linguistic attentions for video summarisation and included both key-frame extraction and video skimming. Fusion of static and dynamic techniques for user attention model produce more informative summaries. Divakaran et al. [21] used motion activity and audio descriptors utilising low-level audio features to generate the video summary. Andaloussi et al. [22] suggested summarization of soccer videos using audio content analysis. They agreed that audio track gives the semantic to the video and so their method was based on the analysis of the audio energy activity in the commentator’s speech. Aural saliency is based on the assumption that a sound may catch a user’s attention in either of the following cases: (1) an absolute loud sound or (2) the sudden increase or decrease of the loudness measured by average energy of sound and energy peak respectively. Furini et al. [6] highlighted the fact that removal of the silent portions reduces the
154
T. Das et al.
overall play out time significantly as the percentage of silent parts is usually very high. This causes temporal reduction without jerkiness. As audio analysis is computationally cheaper than visual analysis, it should definitely be part of the video summarization. Alexander Lerch [9] describes audio content analysis approaches ranging from low level feature extraction of pitches and tempo to the classification of music genre. Audio analysis can help extract certain features which in turn can help detect moods, identify the genre of the video. Transcribing the audio portion of a video requires dealing with text and so some text summarization was also surveyed. Text Summarization is a process by which the most important information from the source document is precisely found [5]. Our text summarization relies heavily on emotion and sentiment analysis because emotions are a crucial part of narratives as suggested in the survey by Kim et al. [14]. Zehe et al. [15] proposed that the use of sentiment trajectories could be a viable approach for story representation and that the emotional state over the course of a novel could help predict story ending.
3
Proposed Approach
The proposed method is depicted in Fig. 1. The input is a video file and the first step is to check for sub-title or transcript. If transcript or sub-title is not available, we generate the transcript using speech recognition tools like Google Web Speech API for recognizing speech. For generating the transcript, first we extract the audio and segment the audio into smaller manageable chunks, on which speech recognition tools can be applied. Segmentation of audio can be equal-sized or on silence detection. While segmenting the audio one needs to save the timestamps for later use in extraction of corresponding video segments. When the transcript is ready, we apply Extractive text summarization on the transcript. We remove stop words as they don’t contribute to central matter but occur frequently in the text which might result in improper assigning of weights to sentences. Every sentence of the transcript is assigned a score between 0 and 1. We identify top k sentences based on scores to generate the summary. This k is variable and is dependent on the length of the original video. This will determine the length of the summary. We sort the sentences in the summary as per the increasing order of timestamps for correct extraction and merging of video segments. Once we have extracted the video segments, we merge them to obtain our video summary. The workflow has been divided into 3 main sub-tasks: (1) Transcript Generation (if transcript not already available), (2) Transcript Summarization and (3) Video Summary Generation. 3.1
Transcript Generation
In the scenario where transcript is not available, transcript has to be generated. We use Python’s module, MoviePy for extracting the audio from the video and Spleeter, a Deezer source separation library with pre-trained models written in Python, to separate the vocals and background music for better speech recognition. We use the Google Speech Recognition engine of Python’s
Summarization of Comic Videos
155
SpeechRecognition library for performing speech recognition to transcribe the shorter audio-chunks obtained by segmenting the original audio at approximately 10 s,i.e., the speech is converted to text and written to a file. 3.2
Transcript Summarization
Summarization can be broken down into three independent tasks: (1) We construct an intermediate representation of the text. There are two types of intermediate representation - (a) topic representation which identifies words describing the topic of the input document and (b) indicator representation which describes every sentence as a list of features (indicators) of importance such as sentence length, position in the document, having certain phrases, etc. For our case, it is a list of sentences in order of the corresponding video segment’s appearance in the narrative. (2) After intermediate representation, we assign an importance score to each sentence- our score depends on identification of certain factors and emotion analysis. (3) Eventually, we identify top k sentences and sort them as per the increasing order of timestamps for correct extraction and merging of video segments, later, to generate the summary. Now this k depends on the length of the original video. The synopsis of any story or movie depends on the following: (1) Identification of major Characters, (2) Significant Theme(s) and (3) Turning Points in story. Scores are assigned to each sentence on the basis of identification of the above factors. A. Identifying Major Characters: In this step, major characters are identified and assigned importance score on the basis of occurrence. In general, on an average, a story contains 3–4 main characters, however this number can vary depending upon the story. Now to identify characters we use spaCy, a free opensource library for Natural Language Processing in Python. We identify Proper Nouns as well as Nouns because sometimes some characters just don’t have names and to accomodate the fact that this consideration will include objects or important props in the story, we consider top 6 most frequent entities. The score is assigned as per the Eqs. 1 and 2, where frequency(character i ) is the number of times the i th character appears in the transcript. character sum =
6
f requency(characteri )
(1)
i=1
character scorei = f requency(characteri )/character sum
(2)
B. Significant Theme(s): Emotions serve as a major part of storyline. During the 1970s, psychologist Paul Eckman identified six basic emotions that he suggested were universally experienced in all human cultures. The emotions he identified were happiness, sadness, disgust, fear, surprise, and anger [13]. Emotion analysis helps in justifying choices made by people which applies to the characters involved any story as well as they are written by humans. Now, with
156
T. Das et al.
the help of these emotions themes can be identified. Accordinng to Russell et al. [16], the following three independent dimensions are necessary and sufficient to represent emotions: (1) valence (positive, negative, or neutral), (2) arousal (active-passive) and (3) dominance (dominant-submissive). So we classify the emotions on the basis of valence as positive for joy, love and negative for anger, sadness, fear. Surprise cannot be classified as positive or negative. For each sentence, emotions are detected and overall sentiment labelled as positive or negative, if surprise element is detected, the sentence is marked for surprise. Table 1. Evaluation of video segments classification as per selection in final summary
VideoName
Accuracy Precision Recall f1-score brier score loss roc auc score
0
Aladdin
0.9403
0.7500
0.7500 0.7500
0.0597
0.8581
1
Ali-baba-and-the-forty-40-thieves
0.8947
0.5000
0.5000 0.5000
0.1053
0.7206
2
Alice in wonderland
0.9524
0.7778
0.7778 0.7778
0.0476
0.8756
3
Beauty and the beast
0.9474
0.7500
0.7500 0.7500
0.0526
0.8603
4
Cinderella
0.9070
0.5556
0.5556 0.5556
0.0930
0.7518
5
Gulliver’s travels
0.9091
0.6000
0.6000 0.6000
0.0909
0.7744
6
King-midas-touch
0.9178
0.6250
0.6250 0.6250
0.0822
0.7894
7
Little red riding hood
0.9429
0.7500
0.7500 0.7500
0.0571
0.8589
8
Noddy and the Island adventure
0.9767
0.9375
0.9375 0.9375
0.0233
0.9616
9
Noddy has a visitor
0.9259
0.7143
0.7143 0.7143
0.0741
0.8359
10 Pinocchio
0.9487
0.7500
0.7500 0.7500
0.0513
0.8607
11 Rapunzel
0.9710
0.8750
0.8750 0.8750
0.0290
0.9293
12 Red shoes
0.9692
0.8571
0.8571 0.8571
0.0308
0.9200
13 The emperor’s new clothes
0.9643
0.8750
0.8750 0.8750
0.0357
0.9271
14 The frog prince
0.8873
0.5000
0.5000 0.5000
0.1127
0.7183
15 The goose that laid the golden egg
0.8947
0.8571
0.8571 0.8571
0.1053
0.8869
16 The lion and the hare story
0.9091
0.7143
0.7143 0.7143
0.0909
0.8301
17 The merchant of venice
0.9286
0.6667
0.6667 0.6667
0.0714
0.8133
18 The monkey and the crocodile Story 0.9216
0.7143
0.7143 0.7143
0.0784
0.8344
19 The secret garden
0.9876
0.8889
0.8889 0.8889
0.0124
0.9412
Average score
0.9348
0.7329
0.7329 0.7329
0.0652
0.8474
C. Turning Points in Story: The Surprise factor comes in handy as it contributes to identifying the turning points in storyline. Also, then we compare the overall sentiment of each sentence sequentially. If there’s change in Overall Sentiment between any sentence and the previous one, some weight is attached to the sentence where the change has occurred as it can be said that turning point in a story can be depicted in change in Sentiment. score = 0.4 ∗ characterwt + 0.4 ∗ change + 0.2 ∗ surprise
(3)
The final score of each sentence is calculated on the basis of Character Weight, Surprise Factor and Change in Overall Sentiment factor as in Eq. 3 where, characterwt is the score assigned to the sentence depending on the occurrence of
Summarization of Comic Videos
157
important characters in the sentence, change is 1 if there is any change in overall sentiment as compared to last sentence otherwise it is 0 and surprise factor is 1 or 0 depending on whether surprise element was detected during Emotion Analysis. In this Mathematical formulation, higher weightage has been assigned to Character occurrence and Turning Point identification. After, the calculation of scores of the statements, top k sentences with highest scores are selected in the order of their appearance in the transcript. 3.3
Video Summary Generation
Finally, video segments are extracted from the original comic video file with respect to the selected k sentences with the help of timestamps and merged together to produce the final summary video output. This k is decided by the length of the summary video which in turn is dependent on the length of the original comic video. Our summary video will be like a trailer and following the general approximate duration of trailers, the summary length cannot be shorter than a minute and no longer than 3 min. For this we choose the summary length to be 10% of the original video length. Table 2. Text emotion labeling sample from dataset
4
Text
Emotion
Oh No! How dare you? Oh my god! Wow Huh
Sadness Anger Surprise Joy Surprise
Dataset and Experimental Results
In this section, we have described the preparation of our dataset and the evaluation of the results obtained after applying our algorithm on the dataset. We have used standard metrics for analysis of the results. To be fair, we have also analysed the results with the help of different user subjects, who were asked to rate the video summaries. 4.1
Development of Dataset
As we could not find any dataset of only animated videos, we prepared our own dataset. The dataset of 500 videos was prepared by downloading animated videos from YouTube, of varying lengths ranging from 3 min 24 s being the smallest video length to 1 h 8 min 41 s being the longest video length. Direct downloading in YouTube is not easy so we used pytube - a lightweight, dependency-free
158
T. Das et al. Table 3. User ratings on informativeness and enjoyability on a scale of 1–10
Video name
Informativeness (r1) Enjoyability (r1) Informativeness (r2) Enjoyability (r2)
Aladdin ali-baba-and-the-forty-40-thieves Alice in wonderland Beauty and the beast Cinderella Gulliver’s travels King-midas-touch Lion and the hare Little red riding hood Noddy and the Island adventure Noddy has a visitor Pinocchio Rapunzel Red shoes The emperor’s new clothes The frog prince The goose that laid the golden egg The merchant of venice The monkey and the crocodile story The secret garden
8 8 6 6 6 7 6 8 8 10 7 8 10 7 7 9 10 6 9 8
8 7 6 8 7 8 9 9 9 10 9 7 10 8 7 10 10 8 9 8
7 7 7 5 6 8 10 9 8 10 10 6 10 7 10 8 10 9 9 8
8 7 8 8 8 8 9 9 9 10 10 6 10 7 10 9 10 10 10 9
Average
7.7
8.35
8.2
8.75
Average informativeness
7.95
Average enjoyability
8.55
Python library for downloading videos from the web. Along with these videos we have used HuggingFace Dataset [11], dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise with which we have merged a text-emotion dataset of our own (See Table 2) for the inclusion of colloquial terms. This dataset was required for training the model for identification of emotions - for this we have used logistic regression. Ground truth for each of the 500 videos was prepared by manual selection of relevant video segments of equal size. 4.2
Evaluation Parameters
Analysis is done by comparing which video segments get selected by the algorithm against those which are selected in ground truth. As we intended the output summary to be trailer like so the length of the summary cannot be less than a minute and not more than 3 min, so only 10% of original video length is considered in the summary. Thus, only a few segments get selected in the final summary and constitute the positive class which becomes the minority class as well. This makes it an Imbalanced Dataset. This is again proved by the fact that we get equal precision, recall and f1-scores as indicated in. Because of this, we require evaluation metrics which can work with such datasets. For this we have used two metrics: ROC AUC score and Brier Score Loss. The benefit of the Brier score is that it is focused on the positive class, which for imbalanced
Summarization of Comic Videos
159
Table 4. Categorical user ratings on informativeness and enjoyability for the videos Video name
Informativeness (r1) Enjoyability (r1) Informativeness (r2) Enjoyability (r2)
Aladdin Ali-baba-and-the-forty-40-thieves Alice in wonderland Beauty and the beast Cinderella Gulliver’s travels King-midas-touch Lion and the hare Little red riding hood Noddy and the Island adventure Noddy has a visitor Pinocchio Rapunzel Red shoes The emperor’s new clothes The frog prince The goose that laid the golden egg The merchant of venice The monkey and the crocodile story The secret garden
B B C C C B C B B A B B A B B A A C A B
B B C B B B A A A A A B A B B A A B A B
B B B C C B A A B A A C A B B B A B A B
B B B B B B A A A A A C A B A A A A A A
classification is the minority class. The values range between 0 to 1. A perfect classifier has a Brier Loss score of 0 [10]. To prevent bias in the analysis of results, the summary outputs of these 20 videos were also analysed on the basis of ratings by two individual subjects, who were also shown the original videos. The ratings were given on the basis of two factors 1. Informativeness - how much of the story could be grasped from the trailer like summary. 2. Enjoyability - whether the summary video was enjoyable or not, whether they found it interesting enough to want to watch the original video. This factor also takes into account how smooth or jerky the summary video felt to the users. Each of the users was individually first shown the video summary and then, the original video to ensure the Enjoyability score was given reliably. After watching both the videos they were allowed to rate Informativenes score. 4.3
Result Analysis
We have selected 20 videos randomly from the dataset and demonstrated the results evaluated on these videos. As depicted in Table 1, we have an average Brier Loss score of 0.0652 which is pretty low. This means that our algorithm is quite efficient in identifying the video segments if not perfect. The average ROC AUC score is 0.8474 which suggests that the algorithm can identify important video segments to be selected in the summary with moderate accuracy.
160
T. Das et al. Table 5. Categories mapping to user-ratings A Informative
9–10
B Got the storyline
7–8
C Not very informative ≤6
As depicted in the Table 3, the ratings were given on a scale of 1–10 and then categorical ratings were tabulated as shown in Table 4 as per the mapping of nominal scores to categories provided in Table 5, of which system the users were informed prior to submission of their ratings. This categorization was done for calculation purpose of obtaining the Cohen’s Kappa to determine the inter-rater reliability, that is how much agreement these two raters had, so as to conclude the ratings as reliable and to show that the analysis is agreeable. A Cohen’s Kappa value is interpreted as follows: values 30ºC
Cancel delivery and do not pay to X
Do delivery and transfer money to X.
Fig. 8. Temperature sensor enabled smart contract based product delivery
Figure 8 shows smart contract condition, and transaction will happen only when condition meets. Condition says if temperature sensor will detect temperature of vehicle container less than 30°C then only person Y needs to pay; else transaction will abort.
200
A. K. Gaur and J. A. Valan
It is totally based on smart contract policy that if condition meets then automatically money will be deducted from person Y’s account and money will be credited to person X’s account. And no one will be able to stop the transaction in between. Like this above mentioned issue may be solved by smart contract.
6 Conclusion Blockchain is a disruptive technology, which provides trust among its users.And it is being recorded from recent years that Internet technology lacks in building trust among its users. The core intention behind blockchain technology is to remove the traditional approach of control and power in one hand. Now blockchain technology applications are not limited to cryptocurrencies only, but people started its implementation in different domains like healthcare sectors, education, product tracking, supply chains, financial organizations etc. Future generations will completely rely on smart contract based transactions where settlement of issues will be handled by consensus protocols in a distributed manner instead of a few centralized organizations. In the future, plan to work in implementation of IoT enabled smart contracts based solutions for food delivery and produce consensus algorithms to control hacker’s attacks in a blockchain network.
References 1. French, S., Leyshon, A., Thrift, N.: A very geographical crisis: the making and breaking of the 2007–2008 financial crisis. C. J. Regi. Eco. Soci. 2(2), 287–302 (2009) 2. Haber, S., Stornetta W.S.: How to time-stmap a digital document. J. Crypt. 437–455 (1990) 3. Bitcoin: A peer-to-peer electronic cash system. https://bitcoin.org/bitcoin.pdf Accessed 4 Jan 2022 4. Salah, K., Nizamuddin, N., Jayaraman, R., Omar, M.: Blockchain-based soybean traceability in agricultural supply chain. IEEE Access 73295–73305 (2019) 5. Akhtar, M.M., Rizvi, D.R.: Traceability and detection of counterfeit medicines in pharmaceutical supply chain using blockchain-based architectures. In: Ahad, M.A., Paiva, S., Zafar, S. (eds.) Sustainable and Energy Efficient Computing Paradigms for Society. EICC, pp. 1–31. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-51070-1_1 6. World Economic Forum survey report of September 2015: Deep shift – technology tipping points and societal ımpact. https://www.weforum.org/reports/deep-shift-technology-tippingpoints-and-societal-impact Accessed 4 Jan 2022 7. Lashkari, B., Musilek, P.: A Comprehensive review of blockcahin consensus mechanism. IEEE Access 9, 43620–43652 (2021) 8. Unix Time: https://en.wikipedia.org/wiki/Unix_timeBlock Number Accessed 6 Jan 2022 9. MD5 Hash generator: https://passwordsgenerator.net/md5-hash-generator/. Accessed 6 Jan 2022 10. Hao, J., Sun, Y., Luo, H.: A safe and efficient storage scheme based on blockchain and IPFS for agricultural products tracking. J. Com. 29(6), 158–167 (2018) 11. Szabo, N.: Smart contracts: building blocks for digital markets. J. Tra. Thou. 16, 18.2, 28 (1996) 12. Wang, S., Ouyang, L., Yuan, Y., Ni, X., Han, X., Wang, F.-Y.: Blockchain-enabled smart contracts: architecture, applications, and future trends. IEEE Trans. Sys. M. Cyb: Sys. 49(11), 2266–2277 (2019) 13. Arumugan, S.S., Umashankar, V., Narendra, C.N.: IoT enabled smart logistics using smart contracts. Int. Con. L.I.S.S. 2018. LISS, pp. 1–6. IEEE (2018)
An Evaluative Review on Various Tele-Health Systems Proposed in COVID Phase Tanima Bhowmik1(B) , Rohan Mojumder2 , Dibyendu Ghosh2 , and Indrajit Banerjee1 1
Department of IT, IIEST Shibpur, Shibpur, Howrah 711103, India [email protected], [email protected] 2 Department of ECE, C.I.E.M Kolkata, Kolkata 700040, India
Abstract. With the rise of Covid-19, the importance of health monitoring has risen to a new peak. Keeping a check on the symptoms of covid is an integral part of our lifestyle now. Using Tele-Health systems can quickly achieve this feat. The Tele-Health field has vastly improved in the span of the uprise of the pandemic and has helped provide medical and non-medical individuals with the help they require. Much work has been done in this field, integrating IoT with the medical field to monitor an individual’s physical parameters efficiently and safely remotely. We have done a systematic review of the works that have helped develop this field during the pandemic. Bringing forward the pros and cons of these systems, we try to draw a clear picture to clearly understand the systems that have helped improve our daily lifestyle over this pandemic period. Keywords: COVID-19 pandemic · IoT · TeleHealth · Reviewed frameworks · Mobile application · Portable · Fuzzy logic · Energy effcient
1
Introduction
The Internet of things (IoT) has a vast application area in modern society. Whether home automation, developing a smart city, or even in the modernization of the medical field, IoT is integrated into the core of everything. During the uprising of the Covid-19 pandemic, medical facilities and evaluations have developed to a great extent via IoT. The first case of Covid-19 was reported in December 2019, and everything around us has changed since then. The global explosion of Covid-19 and its several mutated variants has drastically affected an individual’s lifestyle. Living amidst the lockdowns and quarantine, maintaining healthy habits while remaining confined within the four walls was a challenge for everyone. Telehealth is an alternative form of the traditional treatment processes in a medical institution and provides solutions to many of the problems mentioned above. It includes advanced medical methodologies to run diagnosis c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 201–210, 2022. https://doi.org/10.1007/978-981-19-3089-8_20
202
T. Bhowmik et al.
and treatment without being physically present in the medical institution for both the doctor and the patient. Tele-health provides improved communication between the patients and physicians and has changed the pattern of face-to-face interaction for medical consultancy during the COVID-19 pandemic. Tele-health monitoring is a primarily accepted field for helping both medical workers and affected individuals survive the horrors of the pandemic. The developments in Tele-health have changed monitoring for any spread of disease and illness over a large population [1]. In our previous work [2], we proposed a model for in-home parameter monitoring along with a smart doorbell to ensure proper social measures in the pandemic. These data can be captured and accessed via a cloud platform [3]. Many works have been published on further developing this field concerning the current pandemic. Despite the development in remote monitoring, a few drawbacks cannot be overlooked simultaneously. Issues like data security, data accuracy, etc., have also risen, affecting medical diagnosis.
2
Some Proposed Works in COVID Period
Kaaviya Baskaran et al. [4] (2020) proposed a facial recognition system to replace the manual biometric system in an organization. The facial recognition system marked masks on the face and employee attendance in the organization. A noncontact temperature sensor was installed to determine the individual’s temperature. In addition, the person’s Aarogya Setu QR code was used to determine that individual’s medical history, i.e., COVID positive or not. The higher authorities of the organization, as well as the scanned person, got notified of abnormal health data. The picture of the proposed system is shown in Fig. 1. Key Features of [4] 1. Combined use of IR temperature sensor with Raspberry Pi [21] and Aarogya Setu app based medical history to determine the health status. 2. Facial recognition for mask detection. Limitations of [4] 1. No security features discussed. 2. Not portable. 3. Not discussed whether energy-efficient or not. Md. Mashrur Sakib Choyon et al. [5] (2020) proposed a combination of IoT and machine learning algorithms to monitor a patient’s health remotely. The ML algorithms and computer vision techniques were used to train the system with the datasets of COVID symptoms in different countries. Datasets helped the system to detect the intensity of the sensor data. The monitored sensor data were transferred to the cloud database. Medical professionals could access the monitored data and send instructions to the patient over the same network. The picture of the proposed model is shown in Fig. 2.
An Eval. Review on Various TH Systems Proposed in COVID Phase
203
Fig. 1. Practical setup of [4]
Key Features of [5] 1. ML and Computer Vision to train the system. Limitations of [5] 1. High cost. 2. No security features discussed. 3. Not discussed whether energy-efficient or not.
Fig. 2. Practical setup of [5]
Nurazamiroz Bin Kamarozaman et al. [6] designed a wearable vest with a multi-sensor installed for remote monitoring of patients during the COVID pandemic. A patient wearing a vest could move easily, and the system’s wires would not cause any obstruction. They stored sensor data in ThingSpeak via MQTT broker and Node-Red software. Doctors and parents could monitor patients from a distance using smartphones and web pages. The Twitter account needed to be linked to ThingSpeak to receive notifications about abnormal readings. The picture of the proposed model is shown in Fig. 3. Key Features of [6] 1. Portable vest.
204
T. Bhowmik et al.
Fig. 3. Practical setup of [6]
Limitations of [6] 1. No alert notification without a Twitter account. 2. Only privacy regarding alert notifications is discussed. 3. Not discussed whether energy-efficient or not. Mohit P Sathyaseelan et al. [7] proposed a Bluetooth-based module tracking COVID victims. The authors also designed a COVID RADAR app. When an app user uses the app for the first time, some personal information (including COVID status) must be entered. Bluetooth-based modules could access this data from the database and retrieve Bluetooth addresses from other nearby applications. Based on this criterion, if the module identified a person with the app containing COVID positive data, other people in the range were notified through the app. The system architecture of the proposed review is shown in Fig. 4. Key Features of [7] 1. Remote identification COVID victims. 2. Portable system. Limitations of [7] 1. Bluetooth range is limited to 10–100 m. 2. The Bluetooth module will not detect an app user if COVID victim doesn’t enter correct covid status. Previously, we proposed an IoT-based non-contact thermal screening system and two mobile applications for remote health monitoring [8]. The screening device had a non-contact IR heat sensor camera. Whenever a person goes in front of the camera, it can detect its body temperature, and there was no need to touch the system. Since social distances are crucial in the COVID epidemic, we designed the system to warn if a 1-m gap between the individual and the system was not maintained. Two separate mobile apps have been designed, one for doctors and another for patients. A doctor can check the patient’s health
An Eval. Review on Various TH Systems Proposed in COVID Phase
205
Fig. 4. System architecture of the proposed setup [7]
report and consult the app. Doctor’s app could also track the places visited by a COVID affected patient by tracking the patient’s app. The system architecture of our previous model is shown in Fig. 5. Key Features of [8] 1. Non-contact screening. 2. Two different mobile applications to facilitate remote monitoring. 3. Mobile apps facilitate social awareness. Limitations of [6] 1. No proper security feature has been proposed. 2. Not discussed whether energy-efficient or not.
Fig. 5. System architecture of the proposed setup
Seyed Shahim Vedaei et al. [9] proposed a wearable e-Health system to monitor body parameters as well as maintaining social distance. The wearable sensor
206
T. Bhowmik et al.
node would send the monitored data to a smartphone app via Bluetooth, and then from Smartphone to fog server via 4G/5G/WiFi or LoRa. LoRa would be used only when 4G/5G/WiFi would be unavailable, specially for rural areas. A fuzzy Mamdani system was applied to the fog server to determine the risk of spreading infection based on the health data. Khorshid COVID Cohort [10], Sugeno architecture [11] and various other architectures were utilized to train the mentioned fuzzy logic [12–14]. The smartphone app was composed of a radar that displayed the other sensor nodes (along with their associated phones) within the range of 3-m and would send alert notification if any other node crossed a threshold range of 2-m or less. Some other features of the app included displaying the sensors’ readings, obtaining the risk score obtained from the server, real-time COVID hot-spots displayed on a map, and notifications regarding updated government safety guidelines. The system architecture of the COVID-SAFE platform is shown in Fig. 6. Key Features of [9] 1. Portable system. 2. Fuzzy logic to identify risk score. 3. Smartphone framework to maintain social distancing and spreading awareness. Limitations of [7] 1. No proper security feature has been proposed to maintain privacy. 2. Range of radar is low.
Fig. 6. System architecture of the proposed setup
Itamir De Morats Barroca Filho et al. extended their previous work “Proposing an IoT-based healthcare platform to integrate patients, physicians and ambulance services [15]” in [16] by implementing wearable and unobtrusive sensors
An Eval. Review on Various TH Systems Proposed in COVID Phase
207
in ICU to monitor COVID affected victims of Brazil. The proposed system, known as PAR, was developed to fulfill the duties of hospital operators, nurses and physicians, thereby reducing direct contact with the victims. The system allowed to do a hospital operator’s task, i.e., register all the necessary patient details, doctors & nurses appointed to the patient, and the tasks of nurses and physicians, i.e., monitoring the patients, managing patients’ data & to alert when threshold values were crossed; all these operations were executed virtually and digitally. PAR included multi-parameter monitoring sensors attached to the patient’s body and environment sensors attached to the ICU beds. A mobile application was connected with the system to display sensor data and notify the physicians of anomalous readings. Data security and privacy are crucial, which have been taken care of. The practical implementation of the proposed system is shown in Fig. 7. Key Features of [15] 1. An intelligent ICU monitoring system. 2. A smart software setup considering [17,18]. 3. Data privacy and authenticity using IEC standard, 60601-1-11:2015 (IEC 60601, 2015), HL7 and OAuth V2. 4. Designing a mobile application for observation and alert purposes. Limitations of [15] 1. High installation cost.
Fig. 7. Implementation of the system in ICUs
Mubashir Rehman et al. [19] suggested a non-contact covid detector platform for symptoms, such as shortness of breath, coughing, and hand movements. They used the Universal Software Radio Peripheral (USRP) of the softwaredefined radio (SDR) device to observe the mentioned parameters, which was placed half a meter away from a subject. Next, the CFR amplitude was used to detect real-time channel response to the parameters. Hand movements were only considered to detect the system’s extensive body movement monitoring
208
T. Bhowmik et al.
capabilities. Human cough was considered because it could be a symptom of COVID and test the platform’s ability to observe small body movements in the system. The system showed a change in CFR amplitude whenever a cough is caught. Regular, slow, and quick respiratory rates were considered in this study, and they were measured using Zero-Cross Detection Method, Peak Detection Method, and Fourier Transform [20] to test the system’s accuracy. Key Features of [19] 1. USRP to monitor the parameters. 2. CFR amplitude to detect the real-time channel response. 3. Three separate techniques to identify the breathing rate accuracy. Limitations of [19] 1. No security feature has been proposed to maintain privacy.
3
Conclusion
IoT is currently applied practically in a vast area of the clinical field. While the Tele-Health structure has significantly strengthened in the pandemic, the space for development stays to explore and accomplish further heights. When dealing with medical data, issues like data security, power consumption, cost-efficiency of the system matter a lot. Only a valid person should access the data because medical data is susceptible and holds significant values for an individual during the covid period. Besides this, the longevity of the system holds a great value too. Failure to provide a long battery backup may give rise to difficulty in observation because of an individual being in quarantine. Other issues, like portability and data-transferring ranges, have also been brought to light in our paper. IoT-based medical monitoring has helped many people of both medical and non-medical backgrounds. The Covid-19 pandemic brought dismay and setbacks, but the development in the Tele-Health field gives us hope to stand our ground. TeleHealth will only grow to help more and more people and help us deal with any upcoming medical setbacks in the future.
References 1. Wang, W., Sun, L., Liu, T., Lai, T.: The use of E-health during the COVID19 pandemic: a case study in China’s Hubei province. Health Sociol. Rev. 1– 17 (2021). https://doi.org/10.1080/14461242.2021.1941184. Epub ahead of print. PMID: 34161186 2. Bhowmik, T., Mojumder, R., Ghosh, D., Banerjee, I.: IoT based smart homehealth monitoring system using dempster-shafer evidence theory for pandemic situation. In: 23rd International Conference on Distributed Computing and Networking (ICDCN 2022), pp. 260–265. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3491003.3493232
An Eval. Review on Various TH Systems Proposed in COVID Phase
209
3. Bhowmik, T., Mojumder, R., Banerjee, I., Bhattacharya, A., Das, G.: IoT based data aggregation method for E-health monitoring system. In: International Conference on Computing Communication and Networking Technologies (ICCCNT) (2021). https://doi.org/10.1109/ICCCNT51525.2021.9579885 4. Baskaran, K., Baskaran, P., Rajaram, V., Kumaratharan, N.: IoT based COVID preventive system for work environment. In: Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 65–71 (2020). https://doi.org/10.1109/I-SMAC49090.2020.9243471. 5. Choyon, M.M.S., Rahman, M., Kabir, Md.M., Mridha, M.F.: IoT based health monitoring & automated predictive system to confront COVID-19. In: 17th International Conference on Smart Communities: Improving Quality of Life Using ICT, IoT and AI (HONET), pp. 189–193 (2020). https://doi.org/10.1109/ HONET50430.2020.9322811 6. Kamarozaman, N.B., Awang, A.H.: IOT COVID-19 portable health monitoring system using Raspberry Pi, node-red and ThingSpeak. In: IEEE Symposium on Wireless Technology & Applications, pp. 107–112 (2021). https://doi.org/10.1109/ ISWTA52208.2021.9587444. 7. Sathyaseelan, M.P., Chakravarthi, M.K., Sathyaseelan, A.P., Sudipta, S.: IoT based COVID de-escalation system using Bluetooth low level energy. In: Proceedings of the Sixth International Conference on Inventive Computation Technologies, pp. 174 - 177 (2021). https://doi.org/10.1109/ICICT50816.2021.9358718. 8. Bhowmik, T., Mojumder, R., Banerjee, I., Bhattacharya, A., Das, G.: IoT based non-contact portable thermal scanner for COVID patient screening. In: IEEE 17th India Council International Conference (2020). https://doi.org/10.1109/ INDICON49873.2020.9342203 9. Vedaei, S.S., et al.: COVID-SAFE: an IoT-based system for automated health monitoring and surveillance. IEEE Access 8, 188538–188551 (2020). https://doi. org/10.1109/ACCESS.2020.3030194 10. Sami, R., et al.: A one-year hospital-based prospective Covid-19 open-cohort in the eastern Mediterranean Region: the Khorshid Covid Cohort (KCC) study (2020). https://www.medrxiv.org/content/10.1101/2020.05.11.20096727v2 11. Mamdani, E.H., Assilian, S.: An experiment in linguistic synthesis with a fuzzy logic controller. Int. J. Hum.-Comput. Stud. 51(2), 135–147 (1999). https://doi. org/10.1006/ijhc.1973.0303 12. Karaboga, D., Kaya, E.: Adaptive network based fuzzy inference system (ANFIS) training approaches: a comprehensive survey. Artif. Intell. Rev. 52(4), 2263–2293 (2019). https://doi.org/10.1007/s10462-017-9610-2 13. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998). https://doi.org/10.1023/ A:1009715923555 14. Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M.A., Strachan, R.: Hybrid decision tree and na¨ıve bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41(4), 1937–1946 (2014). https://doi.org/10.1016/j.eswa.2013.08.089 15. Barroca Filho, I.M., de Aquino Junior, G.S.: Proposing an IoT-based healthcare platform to integrate patients, physicians and ambulance services. In: Gervasi, O., et al. (eds.) ICCSA 2017. LNCS, vol. 10409, pp. 188–202. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62407-5 13 16. De Morais Barroca Filho, I., Aquino, G., Malaquias, R., Gir˜ ao, G., Melo, S.R.M.: IoT-based healthcare platform for patients in ICU beds during the COVID-19 outbreak. IEEE Access 9, 27262–27277 (2021). https://doi.org/10.1109/ACCESS. 2021.3058448
210
T. Bhowmik et al.
17. de Morais Barroca Filho, I., Aquino Junior, G.S., Vasconcelos, T.B.: Extending and instantiating a software reference architecture for IoT-based healthcare applications. In: Misra, S., et al. (eds.) ICCSA 2019. LNCS, vol. 11623, pp. 203–218. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-24308-1 17 18. de Morais Barroca Filho, I.: Architectural design of IoT-based healthcare applications. Ph.D. dissertation, Department of Applied Mathematics, Federal University of Rio Grande do Norte, Natal, Brazil (2005). https://repositorio.ufrn.br/jspui/ handle/123456789/26767 19. Rehman, M., et al.: Contactless small-scale movement monitoring system using software defined radio for early diagnosis of COVID-19. IEEE Sens. J. 21(15), 17180–17188 (2021). https://doi.org/10.1109/JSEN.2021.3077530 20. Schrumpf, F., Sturm, M., Bausch, G., Fuchs, M.: Derivation of the respiratory rate from directly and indirectly measured respiratory signals using autocorrelation. Curr. Direct. Biomed. Eng. 2(1), 241–245 (2016). https://doi.org/10.1515/cdbme2016-0054 21. Raspberry Pi: Raspberry Pi 3 Model B (2015). https://www.raspberrypi.org. Accessed 12 Jan 2022
Efficient Scheduling Algorithm Based on Duty-Cycle for e-Health Monitoring System Tanima Bhowmik1(B) , Rohan Mojumder2 , Dibyendu Ghosh2 , and Indrajit Banerjee1 1
Department of IT, IIEST Shibpur, Shibpur, Howrah 711103, India [email protected], [email protected] 2 Department of ECE, C.I.E.M Kolkata, Kolkata 700040, India
Abstract. Modern society has relatively transformed into an IoT-based society. The main issue in IoT-based network is that it utilizes excessive energy, diminishing battery lifetime. This paper proposes an efficient scheduling algorithm based on the duty cycle for the e-Health monitoring system. The system controls each node to switch from a wake-up state to a sleep state to conserve energy. Here, we applied modified game theory to optimize the wake-up or sleep scheduling of the sensors. The duty cyclebased scheduling algorithm considers the wake-up probability of a sensor node, maximum allowable traffic volume, expected data packet latency. The sensor node continuously adapts activation strategies (wake-up or sleep state). The system controls each node to switch from a wake-up state to a sleep state to conserve energy and reduce data packet latency. The key objectives of the scheduling are to diminish the network’s energy consumption, and the delay in data packet transmission increases the lifetime of the sensor nodes. We have developed a mobile application to control the duration of the sleep state, display the sensor readings, and notifies the user for any abnormal readings. Simulation results are stated to show the effectiveness of the proposed approaches. Keywords: Internet of Things (IoT) · e-Health monitoring Duty-cycle · Game theory · Mobile application
1
·
Introduction
Through the internet, wireless sensor networks (WSNs), and other software technologies are connected to advance IoT networks. Data aggregation and scheduling play a crucial part in IoT as it diminishes energy consumption, eliminate repetitious data, and reduce complexity time [1]. Batteries commonly power the sensor nodes, but their energy is bounded, and it is rigid or even dangerous to restore the batteries, especially in critical medical situations [2]. Sensor nodes do not have an optimized sleep schedule according to the traffic load in the static duty-cycling method [3–5]. However, in the dynamic duty cycle method, [6], c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 211–220, 2022. https://doi.org/10.1007/978-981-19-3089-8_21
212
T. Bhowmik et al.
each sensor can optimize sleep scheduling independently when required to lessen energy consumption [7]. However, IoT-based sensor nodes aggregate data in a wake-up state. Sleep latency is extended [8] as the sensor nodes need additional time to switch to a wake-up state throughout the duty cycling approach. On the other hand, it degrades the whole network efficiency. The main contributions of the paper are as follows: – Designed a mathematical model to analyze the efficient duty-cycle-based scheduling for an e-Health monitoring system. – Applied modified game theory in the duty cycling model to find an optimal wake-up or sleep scheduling strategy to improve the network’s energy efficiency. – Each sensor node activates independently and follows a non-cooperative and repeated type game model. – As each sensors are independent rational players, it selects a strategy to wake up or go for a sleep state minimizing the network’s energy consumption and prolongs the network lifetime. – A mobile application monitors the clinical data and notifies the individual of anomalous readings. In Sect. 2, we consider the related works of other e-Health systems and their difference with the proposed method. Section 3 and Sect. 4 explain the modified game theory and the outcomes of our proposed method, respectively. Future recommendations and conclusions are elaborated in Sects. 5 and 6, respectively.
2
Literature Review
Alpi Tomar et al. [9] proposed a game theory-based clustering approach, where non-cluster head nodes picked some targets in the data transmission, which reduces energy consumption. Earlier, we proposed e-Health monitoring models [10,14] that didn’t have had any strategy to minimize energy consumption. Zheng et al. [11] implement the game theory model to energy harvesting (EH)-wireless sensor networks (WSN) with the connotation of learning mechanisms to optimize the energy and predict the condition of neighbouring nodes. M.S.Kordafshari et al. [12] proposed evolutionary game theory on collective energy awareness and duty cycle scheduled routing approach in WSNs. Here a sensor switches between wake-up and sleep state when its utility is less than the average utility of all sensors to reduce energy consumption. Sensors will not receive data packets during the sleep state, so the data packets are dropped. Hongseok Yoo et al. [13] proposed Duty-cycle Scheduling based on Residual energy (DSR) and Duty-cycle Scheduling based on Prospective increase in residual energy (DSP) to reduce the sleep latency as well as to maintain equal energy consumption for all the sensor nodes in WSNs. In our proposed e-health monitoring system, we apply the duty-cycle based game theory model to reduce the energy consumption during the wake-up state of the sensor node. It also avoids excessive data collection from individual sensors respectively.
Efficient Scheduling Algo Based on D-C for e-Health Monitoring Sys.
3
213
Proposed System
A detailed description for the model is given below. 3.1
Proposed Approach
We propose a system that interacts between sensor nodes as a non-cooperative and repeated type game, then each node activates autonomously and prioritizes its communication over neighbour’s communications. The node is egocentric and unsighted to the neighbour’s strategies and repeatedly plays the game [15–17]. The dynamic changes in the environment can change the system model. The decision strategies also change with the next iteration of the game. The system is called an efficient duty-cycle based scheduling game algorithm. The game model contains three segments: the set of players (sensors), set of movements for individual sensors, set of utility functions. So, using these models, the decision can be predictable and improved with proper design. The utility function would properly be updated. In our proposed system, we are correlating the game theory with some restrictions. The game model is expressed in Eq. 1 as:Gm = [X, (Au )ux , (Uu )ux ]
(1)
where X represents set of players or sensors, X ∗ = {X1 , X2 , . . . . Xn }. Au = {S, P } is denoted as activation strategies, where S denotes sleep state & P denotes wake-up state for each sensor. Uu is the state-based utility function. In our practical model, we have utilized the NodeMCU ESP8266-12E microcontroller to handle the sensors as well as aggregate the sensor results to the cloud. Here, LM35 and MAX30100 sensors are considered as the players of the game theory and observes body temperature, pulse-rate and SpO2. According to the game theory, the system is set up so that the rest of the system’s sensors are off when one sensor monitors. The duty-cycle based scheduling algorithm considers the wake-up probability of sensor node, maximum allowable traffic volume, expected data packet latency. In IoT based network, it is impossible to predict the wake-up probabilities. The data packet latency also depends on the wake-up probabilities. The sensors enthusiastically modifies the wake-up probability AP u with respect to the data packet latency Lu . The modified next wake-up probability is the function of the current wake-up probability and packet latency. The mathematical model is expressed in Eq. 2 as: AP u (τ + 1) = Fu [AP u (τ ), Lu (τ )] Lu (τ ) ∝ [AP u (τ )]
(2)
where AP u , Lu are the wake-up probability and packet latency of node u at time τ . Packet latency also changes proportionally with the sensor’s wake-up
214
T. Bhowmik et al.
and sleep state. Packet latency also depends on traffic volume, which can be predictable with buffer state and event state in Eq. 3. AP u = Fu [AP u , Lu ]
(3)
Pu is differentialble, ΔFu /ΔLu = 0, then the function is expressed in Eq. 4: Lu (τ ) = δu [AP u (τ )]
(4)
δu denotes a unique differentiable decreasing function. So if the wake-up probability increases, latency decreases. So, if there is no collision, then packet latency is inversely proportional to the wake-up probability. Each sensor’s utility function is designed as u (τ ) Du (τ ) − ω E Eu (τ ) , [Wake-up state] (5) Uu = 0, [Sleep state] where in Eq. 5, Du (τ ) denotes the value of data captured by the sensor u at time τ . Eu (τ ) represents the amount of energy consumption for activation, Eu (τ ) denotes the current energy state of sensor u, ω represents energy cost. The sensor adjusts its activation strategies (wake-up state or sleep state) to maximize the utility cost. Du (τ ) constitutes minimum, maximum, and mean values. So the utility function and the strategy for the sensor will resolve the whole performance of the monitoring system. Each sensor continuously plays a game where the actions of the sensors are in a wake-up or sleep state and accordingly receive a utility. There is no pre-defined strategy; the sensor nodes get their activation strategy through continuous play. They are repeatedly adopting their strategies to maximize the utility. Following the proposed game theory model, the game initiates when the NodeMCU gets activated. When in the wake-up state, one sensor turns ON, monitoring parameter(s) respectively, and turns OFF after capturing required data. The other sensor is immediately activated and follows the same procedure. It ends the game and completes a cycle. We have set up the NodeMCU to identify each sensor’s maximum, minimum, and mean readings. Then these values are uploaded to the cloud database. But before uploading, we compare these readings with the results uploaded when NodeMCU was previously active. NodeMCU reads the previously uploaded results directly from the cloud database. For example, suppose the system monitors at an interval of 1 h, and the system records readings at noon, then these results would be compared with the results of 11 am. If any newly obtained reading differs from that of the previously aggregated reading, only new readings will get aggregated to the cloud to avoid the same results from being constantly stored in the cloud. A mobile app will notify when the sensor node becomes active and observes the cloud data. It generates alert notification(s) for abnormal reading(s), and the buzzer will also sound. After that, NodeMCU goes to sleep for the user-defined duration by utilizing the deep-sleep feature. The timespan of the deep-sleep state can be changed by the mobile application and NodeMCU retrieves the deep-sleep duration from the
Efficient Scheduling Algo Based on D-C for e-Health Monitoring Sys.
215
cloud database. The sensor node shows nominal operations in the sleep state, and no data are collected at this state. Hence the sleep state has reduced energy consumption than the wake-up state. 3.2
Algorithm
This algorithm describes the duty-cycle-based game model algorithm. Algorithm 1: Duty-cycle based Game model strategy Input: Set of sensors X, Activation strategies A, (sleep state & wake-up state) Output: utility function Data: – Tmin and Tmax represents the minimum and maximum readings. – Tmean stores the mean of the obtained four readings of the sensor. – T [T1 , T2 , T3 , T4 ] are the obtained sensor values 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Initialize τ = τinitial X = {X1 , X2 , ..., Xn } Compute model Gm , and Packet latency=L Packet strategies: Ap = wake-up state, As = sleep state Ap (τ ) = Fu [Ap (τ ), L(τ )] L(τ ) = δ[Ap ]// when δ ← −1 to 0 ) Compute U , and U = D − ω E(τ and wait for time τ (τ ) if U is lowest among all neighbour nodes then put node to sleep if Activate the node then then update τ (τ, u) Uploading the sensor data (D) to the Cloud. /* Considering LM36 for further steps if mean = meanprev then then mean is aggregated.
17
if Tmin = T Lprev then if Tmin < 36.1 then user will be alerted.
18
Tmin is aggregated.
15 16
19 20 21 22 23
*/
if Tmin = T Lprev then if Tmax > 37.2 then then user will be alerted. Tmax is aggregated. User is notified via app for abnormal readings.
The sensors are continuously adopting the activation strategies through the continuous application of the game. It also explores the concept of data comparison. Maximum, minimum, and mean values are stored in separate variables. After that, the comparison and data uploading procedures are explained from 13 to 23.
216
4
T. Bhowmik et al.
Experimental Result
The outcomes from the framework after running tests are clarified below: 4.1
Energy Conservation
A constant source of power is connected to the setup. A current flow of 39.1 mA is observed during the wake-up state and a drop in current consumption to 10.3 mA is observed in the deep-sleep state. The difference in the observed current flow during the wake-up state and sleep state is demonstrated in Fig. 1 and Fig. 2. The calculation given below brings forward the observed change in battery life during the two states.
Fig. 1. Wake-up state
Fig. 2. Sleep state
The Eq. 6 shows the calcultion for battery life. BatteryLif e =
BatteryCapacity Consumptionof thedevice
(6)
Here in we have shown the difference of battery life with and without game theory based deep-sleep. There is no fixed duration of wake-up state for the sensor node as game theory gets executed during this state. To compare the battery life for different durations of wake-up state of the sensor node, we are considering a battery of 12 V & 800 mAH. When the sensor-node is in wake-up state for 30 s or, 0.0083 h and in deepsleep state for 59 min 30 s or, 0.9917 h. Without Game Theory Based Deep-Sleep Application Overall current consumption of the device for 1 h, (39.1 ∗ 1) mA = 39.1 mA
(7)
Efficient Scheduling Algo Based on D-C for e-Health Monitoring Sys.
217
Thus the battery will last for, 800 h = 20.46 h 39.1
(8)
With Game Theory Based Deep-Sleep Application Overall current consumption of the device for 1 h is divided in 39.1 mA for 0.0083 h & 10.3 mA for 0.9917 h Overall current consumption of the device for 1 h is shown in Eq. 9, (39.1 ∗ 0.0083 + 10.3 ∗ 0.9917) mA = 10.53 mA
(9)
Thus the battery will last for, BatteryLif e =
800 h = 75.97 h 10.53
(10)
From Eqs. 9 & 10 we can observe a 3.7 (approx) times increase in battery life than that in the case of always wake-up state. Similarly, when the sensor-node is in wake-up state for 24 s or, 0.4 h and in deep-sleep state for 59 min 36 s or, 0.99 h. Without Game Theory Based Deep-Sleep Application Without game theory based deep-sleep application, the overall current consumption and battery life remain same as mentioned in Eqs. 7 and 8 respectively. With Game Theory Based Deep-Sleep Application Overall current consumption of the device for 1 h is divided in 39.1 mA for 0.0067 h & 10.3 mA for 0.99 h Total device consumption in 1 h, (39.1 ∗ 0.0067 + 10.3 ∗ 0.99) mA = 10.46 mA
(11)
Thus the battery will last for, 800 h = 76.5 h (12) 10.46 From Eqs. 11 & 12 we can again observe that 3.7 (approx) times increase in battery-life than the always wake-up state. Hence we can conclude a 3.7 (approx) times increase in the battery life after the application of the duty-cycle based game model. BatteryLif e =
4.2
Mobile Application
We have designed a mobile application to monitor the sleep state’s timespan and fetch the cloud information from the database. There are three options in the application:- 1. Scheduler 2. Cloud 3. Change Language, as shown in Fig. 3. The scheduler option has been provided to monitor & modify the timespan of the sleep state. This option also displays the status of the sensor node, i.e., wake-up or sleep state. The interface of the scheduler page is shown in Fig. 4.
218
T. Bhowmik et al.
The cloud option provides the user with the access to fetch the real-time sensor readings from the cloud database. The interface of the option is shown in Fig. 5. The user will get a notification if the sensor node records any anomalous readings. A call button pops up on this page and gives the user access to an emergency contact number. This emergency number would be provided when the user installs the application for the first time. Using the Change Language option, the user can change the application’s Language to other available languages (Hindi, Bengali, etc.). English is kept as the default language.
Fig. 3. Options
Fig. 4. Scheduler page
Fig. 5. Cloud database with call feature
4.3
Simulation Outcomes
This part provides simulation results of the proposed duty-cycle-based game model. The monitoring sensors are randomly distributed and are static during a time slot. Here we consider three monitoring systems. The number of sensors in each of the monitoring systems is identical. The Fig. 6 and Fig. 7 shows the performance comparison with and without applying the duty-cycle-based game model in terms of total network utility and network energy consumption. Dutycycle based game model can improve the utility when the number of monitoring systems increases. So, the proposed algorithm consumes less energy, then the sensor nodes’ lifetime increases.
Efficient Scheduling Algo Based on D-C for e-Health Monitoring Sys.
Fig. 6. Total network utility
5
219
Fig. 7. Avg. energy consumption
Future Work
In the future, further reduction in the current consumption during sleep state to a range of microamperes is planned. We have also planned to increase the monitoring parameters by increasing sensors. Growing the application to iOS and Windows-like different platforms will expand the variety of technologies and boost patient observing in a distant region.
6
Conclusion
This paper strives to achieve an energy-efficient e-Health monitoring system using a duty-cycle based game model. Our proposed system reduces the current consumption during the sleep state down to 10.3 mA, increasing the sensor node’s battery life and overall network lifetime. We have constructed a mobile application to get the aggregated information, recognize the present status of the sensor node, and monitor the duration of sleep state. As a result, our system is energy-efficient, cost-effective, user-friendly, handy, and supports mobility.
References 1. Fitzgerald, E., Pi´ oro, M., Tomaszewski, A.: Energy-optimal data aggregation and dissemination for the internet of things. IEEE Internet Things J. 5(2), 955–969 (2018). https://doi.org/10.1109/JIOT.2018.2803792 2. Jiao, X., Lou, W., Feng, X., Wang, X., Yang, L., Chen, G.: Delay efficient data aggregation scheduling in multi-channel duty-cycled WSNs. In: 15th International Conference on Mobile Ad-hoc and Sensor Systems (2018). https://doi.org/10.1109/ MASS.2018.00055 3. Bai, J., Zeng, Z., Abualnaja, K.M., Xiong, N.N.: ADCC: an effective adaptive duty cycle control scheme for real-time big data in Green IoT. Alex. Eng. J. 61, 5959–5975 (2022). https://doi.org/10.1016/j.aej.2021.11.026 4. Panda, S.K., Nanda, S.S., Bhoi, S.K.: A pair-based task scheduling algorithm for cloud computing environment. J. King Saud Univ. - Comput. Inf. Sci. 34, 1434– 1445 (2022). https://doi.org/10.1016/j.jksuci.2018.10.001
220
T. Bhowmik et al.
5. Han, G., Dong, Y., Guo, H., Shu, L., Wu, D.: Cross-layer optimized routing in wireless sensor networks with duty cycle and energy harvesting. Wirel. Commun. Mobile Comput. 15(16), 1957–1981 (2015). https://doi.org/10.1002/wcm.2468 6. Kang, B., Nguyen, P.K.H., Zalyubovskiy, V., Choo, H.: A distributed delay-efficient data aggregation scheduling for duty-cycled WSNs. IEEE Sens. J. 17(11), 3422– 3437 (2017). https://doi.org/10.1109/JSEN.2017.2692246 7. Wan, P.-J., Wang, Z., Wan, Z., Huang, S.C.-H., Liu, H.: Minimum-latency schedulings for group communications in multi-channel multihop wireless networks. In: Liu, B., Bestavros, A., Du, D.-Z., Wang, J. (eds.) WASA 2009. LNCS, vol. 5682, pp. 469–478. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-034176 46 8. Gu, Y., He, T.: Dynamic switching-based data forwarding for low duty cycle wireless sensor networks. IEEE Trans. Mobile Comput. 10(12), 1741–1754 (2011). https://doi.org/10.1109/TMC.2010.266 9. Tomar, A., Chanak, P.: A game theory based fault tolerance routing scheme for wireless sensor networks. In: IEEE International Students’ Conference on Electrical, Electronics and Computer Science (2020). https://doi.org/10.1109/ SCEECS48394.2020.132. 10. Bhowmik, T., Mojumder, R., Banerjee, I., Bhattacharya, A., Das, G.: IoT based non-contact portable thermal scanner for COVID patient screening. In: IEEE 17th India Council International Conference (INDICON 2020) (2020). https://doi.org/ 10.1109/INDICON49873.2020.9342203. 11. Zheng, J., Cai, Y., Shen, X., Zheng, Z., Yang, W.: Green energy optimization in energy harvesting wireless sensor networks. IEEE Commun. Mag. 53(11), 150–157 (2015). https://doi.org/10.1109/MCOM.2015.7321985 12. Kordafshari, M.S., Movaghar, A., Meybodi, M.R.: A joint duty cycle scheduling and energy aware routing approach based on evolutionary game for wireless sensor networks. Iran. J. Fuzzy Syst. 14(2), 23–44 (2017). https://doi.org/10.22111/IJFS. 2017.3132 13. Yoo, H., Shim, M., Dongkyun: Dynamic duty-cycle scheduling schemes for energyharvestiong wireless sensor networks. IEEE Commun. Lett. 16, 202–204 (2020). https://doi.org/10.1109/LCOMM.2011.120211.111501 14. Bhowmik, T., Mojumder, R., Banerjee, I., Bhattacharya, A., Das, G.: IoT based data aggregation method for E-health monitoring system. In: International Conference on Computing Communication and Networking Technologies (ICCCNT) (2021). https://doi.org/10.1109/ICCCNT51525.2021.9579885. 15. Fudenberg, D., Tirole, J.: Game Theory. MIT Press, Cambridge (1991). @BOOKRePEc:mtp:titles:0262061414, Edition: 1, ISBN 0-262-06141-4, https:// EconPapers.repec.org/RePEc:mtp:titles:0262061414 16. Kashtriya, P., Kumar, R., Singh, G.: Energy optimization using game theory in energy-harvesting wireless sensor networks. In: International Conference on Secure Cyber Computing and Communication (ICSCCC) (2018). https://doi.org/ 10.1109/ICSCCC.2018.8703336 17. Gharehshiran, O.N., Krishnamurthy, V.: Dynamic coalition formation for efficient sleep time allocation in wireless sensor networks using cooperative game theory. In: 12th International Conference on Information Fusion Seattle, WA, USA, 6–9 July 2009
Image Splicing Detection Using Feature Based Machine Learning Methods and Deep Learning Mechanisms Debjit Das(B) and Ruchira Naskar Department of Information Technology, Indian Institute of Engineering Science and Technology, Shibpur 711103, West Bengal, India [email protected], [email protected]
Abstract. Digital Image forgery has become widespread nowadays. Fraudsters can intentionally use forged images for different illegitimate and malicious purposes. One of the most severe forms of image forgery is Image Splicing which can be defined as making an artificial image by joining multiple parts of different source images to form a natural-looking forged image. Image Splicing Detection has a fundamental importance in Digital Forensics and Cyber Security. The detection mechanism of Image Splicing is of two types – Feature-based schemes with Machine Learning and methods based on Deep Learning. In this paper, different machine learning-based and deep learning-based research works have been briefly discussed to detect image splicing along with experimental results. Then discussion about the datasets explored, different performance metrics, and a comparative study of the methods is also presented. Finally, it is concluded with a discussion about the possible future scope of research in this regard. Keywords: Classification · Convolutional neural network · Deep learning · Feature extraction · Image splicing · Machine learning
1 Introduction Digital Image Forgery is the act of tampering with an image to conceal or hide important information from the image. With the easy availability of various image tampering tools nowadays, image forgery has become widespread. Hence, identification, analysis, and detection of digital image forgery play an essential role in the nation’s forensics and security domain. Among the different types of image forgery, the two most important are: Copy Move Forgery [1, 2] and Image Splicing [3, 4]. Image Splicing is the act of artificially generating a composite image by merging parts of different images from different sources to form a single forged image that appears as an original natural image. An example of image splicing is given in Fig. 1. Since the spliced image is formed with portions of different source images, the image textures and features undergo abrupt changes from region to region, so it is more challenging to detect the splicing of images successfully. Several research works have been performed towards solving the problem of image splicing detection as it © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 221–232, 2022. https://doi.org/10.1007/978-981-19-3089-8_22
222
D. Das and R. Naskar
Fig. 1. Example of ımage splicing: (a) and (b) two authentic images, (c) spliced image composited of authentic image (a) and (b)
has a fundamental importance in the domain of digital forensic and cyber security. The organization of this paper is as follows- Sect. 2 presents different image splicing detection approaches based on feature extraction and machine learning. Section 3 offers different deep learning-based mechanisms to detect splicing attacks in images automatically. A discussion about some of the commonly used datasets is given in Sect. 4, and a description of different performance metrics used for classification is given in Sect. 5. A comparative study among the discussed methods is provided in Sect. 6. Finally, Sect. 7 concludes with a discussion on the possible future extension of this work.
2 Image Splıcıng Detectıon Based on Feature Extractıon and Machıne Learnıng In the case of feature engineering and machine learning-based splicing detection, after pre-processing, essential features are extracted from the input images, a feature vector is created, and a machine learning-based model is trained. Different feature selection methods, e.g., Sequential Forward Selection (SFS) [13], can be used to select the best features. Dimensionality reduction techniques like Principal Component Analysis (PCA) [14, 15] can also be applied to reduce the feature vector dimension further. Finally, a classifier is used to distinguish the spliced images from the authentic images. In this paper, we will discuss the following mechanisms in this regard. 2.1 Markov Features Based Image Splicing Detection Pham et al. [5], have developed an effective Image Splicing detection method based on coefficient-wise Markov features and block-wise Markov features which were extracted in the Discrete Cosine Transform (DCT) domain. In the DCT domain, Markov features perform comparatively better than in Discrete Wavelet Transform (DWT), spatial or other domains. First, the input color image was converted into a grayscale image, and then it was divided into non-overlapping blocks of size 8 × 8, upon which 2D-DCT was applied. The corresponding image containing the round-off DCT coefficients was labeled as I. The coefficient-wise difference arrays were calculated in four directions, i.e., horizontal, vertical, main diagonal, and anti-diagonal, and termed as Ch , Cv , Cd and Ca respectively, where, Ch was computed as shown in Eq. 1. Ch (u, v) = I (u, v) − I (u, v + 1)
(1)
Image Splicing Detection Using Feature Based Machine Learning
223
If the initial size of image was H × W, then u = 1, 2,…H and v = 1, 2,…W−1. Next, the transition probability matrices were calculated using these four arrays, and the final coefficient-wise Markov feature vector was formed. Similarly, the block-wise difference arrays, namely Bh and Bv for the horizontal and vertical direction, were calculated, and the corresponding block-wise Markov feature vector was created. The block diagram of this is given in Fig. 2.
Fig. 2. Block diagram of Markov feature extraction process
Finally, all the features were fed to a Support Vector Machine (SVM) classifier. The authors used the CASIA TIDE v2.0 dataset [5, 7, 23]. They successfully detected splicing with an accuracy of 96.9% with feature vector dimension 566. 2.2 Image Splicing Detection Based on Hybrid Feature Set The method proposed by Jaiswal et al. [3] is based on a Hybrid feature set, a combination of different texture and shape-based features. From each input image, four different types of features were extracted, which are - Histogram of Oriented Gradients (HOG) [3], DWT [3, 16], Laws Texture Energy (LTE) [3], and Local Binary Pattern (LBP) [3, 21]. Then using all these features, a combined feature vector was formed. HOG is an image feature descriptor, and here, they extracted 36 HOG features. DWT has been widely used for denoising an image. Common forms of wavelets in DWT are - Haar wavelet [17, 18], Daubechies wavelet [18], and Dual-tree Complex Wavelet Transform (DCWT) [19]. The authors took a total of 32 DWT features. They also selected 15 LTE features in this work. LBP is a texture-based feature descriptor, and in this work, a total of 59 LBP features were extracted. Finally, all four feature vectors were combined to form the final feature vector of dimension (36 + 32 + 15 + 59) = 142. This hybrid feature vector was fed to a logistic regression classifier. They explored three datasets, which were – CASIA v 1.0 [3, 7], CASIA v 2.0 [3, 23] and Columbia dataset [22]. They received 98.3% accuracy on CASIA v1.0, 99.5% accuracy on CASIA v 2.0, and 98.8% accuracy on Columbia dataset. 2.3 Singular Value Decomposition and DCT Based Splicing Detection Moghaddasi et al. [6] have developed a splicing detection technique based on Singular Value Decomposition (SVD) and DCT. The Singular Value Decomposition of a matrix M can be expressed as the product of three matrices, U, , and the transpose of V, i.e.,
224
D. Das and R. Naskar
VT . In this method, the input image was sub-divided into n × n non-overlapping blocks where n = 3, 4,….., 27. Then singular value vector of each sub-block was calculated, and the natural logarithm of the inverse of each singular value was computed and merged for each sub-block of n. Then, they calculated variance, average, skewness, and kurtosis for each block size of n. Since there is a total 25 number of block sizes (3 to 27), and for each block size, four values are calculated; therefore, the feature dimension becomes 25 × 4 = 100. Kernel-PCA was applied for dimensionality reduction, and then it was fed to an SVM classifier. In another approach, after sub-dividing, 2D-DCT was used to extract DCT based features on each block. After that, SVD was calculated, and here also, a 100 dimension feature vector was extracted. This work explored Columbia Image Splicing Detection Evaluation Dataset [22]. Experimental results showed that when the original 100-dimensional feature vector was used, the SVD based feature vector gave 77.65% accuracy, and SVD-DCT gave 80.79% accuracy. After applying Kernel-PCA, SVD-DCT based feature vector achieved an accuracy of 98.78% with a feature set dimension 50 only. 2.4 Image Splicing Detection Based on Markov Features in QDCT Domain In the work suggested by Li et al. [7], at first, a color input image was sub-divided into multiple non-overlapping blocks of size 8 × 8. Then Red (R), Green (G), and Blue (B) color components of each block were used to construct the quaternion matrix, which was processed with Quaternion DCT (QDCT) transform. After that, by reassembling all matrices, the required 8 × 8 block QDCT matrix(G) of input image was obtained. The G is modified after rounding off the values to integer and taking the absolute values to form the QDCT coefficient array F, from which the intra-block difference 2-Dimension arrays for horizontal, vertical, main diagonal, and minor diagonal were calculated, which were denoted as Fh , Fv , Fd , F-d . Similarly, the inter-block difference 2-D arrays in four directions were calculated and denoted as Gh , Gv , Gd , and G-d . Next, they implemented a threshold technique and calculated the required Transition Probability Matrices from which the intra-block and inter-block correlation among QDCT coefficients were stored. The expanded Markov features in the QDCT domain were extracted and fed to an SVM classifier. The authors explored CASIA V1.0 [3, 7] and CASIA V2.0 [3, 7, 12] color image dataset and used Primal SVM classifier with Markov threshold value selected as 4. This model achieved an accuracy of 92.38% on the CASIA V2.0 dataset with feature vector dimension 972. The block diagram of generating 8 × 8 block QDCT of a color image is shown in Fig. 3. 2.5 Image Forgery Detection in Color Images Based on LBP and DCT A passive image manipulation detection scheme had been proposed by Alahmadi et al. [8], based on 2-Dimensional DCT and LBP [3, 8, 21]. This method first transformed the input RGB color image to the YCbCr color model. In YCbCr format, the RGB color image is compressed into three channels - luminance (Y), blue-difference chroma components (Cb), and red-difference chroma components (Cr). Then the chroma components of each input image was divided into overlapping blocks of size 16 × 16. Next, for each image block, LBP was calculated, and after that, 2D-DCT was applied to transform it
Image Splicing Detection Using Feature Based Machine Learning
225
Fig. 3. Block diagram of 8 × 8 block QDCT generation of RGB color ımage
into the frequency domain. Afterward, they calculated the standard deviation for each DCT coefficient for all blocks, all of which were used as the feature vector. Finally, the feature vector was sent to an SVM classifier. This scheme achieved a detection accuracy of 79.65% using Y, 96.5% using Cb, and 95.8% using Cr channels individually. The results prove that chromatic components are more capable of forgery detection. When features are selected from Cb and Cr jointly to form the feature vector, the detection accuracy becomes optimum, and it is 97%.
3 Image Splıcıng Detectıon Based on Deep Learning In this section, different schemes based on deep learning [9–12] for detecting image splicing have been discussed. There is no need to extract features manually from images for deep learning-based approaches. These schemes perform automatic feature extraction primarily based on Convolutional Neural Network (CNN). A CNN consists of many convolutional and pooling layers, and at the end, there are one or more fully connected layers. These deep learning-based methods are more robust, and they provide much better detection accuracy than the feature extraction-based machine learning approaches. But these schemes require a very high amount of data and time for training the model, and it is also costly because of the complex modeling structure. Some deep learning-based mechanisms for detecting image splicing are discussed here. 3.1 Image Splicing Detection Based on Mask-RCNN and ResNet-Conv Ahmed et al. [9], have proposed an image splicing detection method based on deep learning. This work is based on a deep learning backbone architecture called ResNetConv In this approach, they trained a supervised Mask-Regional CNN (Mask-RCNN) to learn the hierarchical features generated due to image splicing. They used a residual network (ResNet). In the ResNet-Feature Pyramid Network (ResNet-FPN), the FPN had been replaced by convolution layers to construct the ResNet-Conv. This deep learningbased residual network ResNet-Conv was used to generate the initial feature map to train the supervised Mask-RCNN. The ResNet-50 and ResNet-101 architectures were used as the backbone in this approach, among which ResNet-50 was faster than ResNet-101. To initialize the developed network, they implemented a transfer learning strategy. They
226
D. Das and R. Naskar
used a large number of computer-generated tampered images using the COCO dataset [20] and some random objects for training. Experimental results showed that this scheme achieved an Area Under the Curve (AUC) score of 0.967. 3.2 Image Forgery Detection Based on WPIC and CNN Stanton et al. [10], have proposed an image forgery detection method based on a color phenomenology-based algorithm - White Point Illuminant Consistency (WPIC). In this approach, each input image was segmented first and converted into chromaticity coordinates to be compared with the camera’s EXIF file’s white points. For classification, the white point shifting had to be indicated, and for that, they used CNN. A block diagram of the WPIC algorithm is given in Fig. 4.
Fig. 4. Block diagram of WPIC algorithm
In the WPIC algorithm, first of all, from the input RGB color image, the corresponding XYZ coordinates were estimated. Then they computed the required chromaticity coordinates. Next, from the superpixels, they computed the test statistics. Then from the correlation coefficient, PCA [14, 15], and white point balance, the required Histogram was generated and then the authors generated the Illuminant Error Histogram (IEH). Finally, based on this IEH, they designed a CNN to detect image splicing successfully. A 101 × 101 dimensional IEH was fed to the CNN as input. In the CNN, Conv-ReLU was used, followed by a MaxPool with stride 2, for four consecutive levels, and finally, a fully connected layer was used, the output of which was used to detect whether an input image was spliced or not. They used a forensic dataset called NIST Media Forensics Challenge 18 (NIST MFC18) to train the network and used the DSO-1 [4, 24] dataset for testing. Experimental results showed that this scheme achieved an AUC score of 76%. 3.3 Image Splicing Detection Based on Illuminant Maps and CNN Image Splicing creates inconsistencies that can be highlighted using illuminant maps. An image splicing detection and localization scheme had been proposed in [4] based on CNN and illuminant maps. Here, the authors used Residual Networks (ResNet) architecture to skip the convolutional layers blocks by using residual blocks that can be expressed as shown in Eq. 2 and 3. yl = h(xl ) + F(xl , Wl )
(2)
Image Splicing Detection Using Feature Based Machine Learning
xl+1 = f (yl )
227
(3)
where, xl is the input and xl+1 is the output of the l-th block. F is residual mapping function, f is ReLU activation function, and h(xl ) = xl is identity mapping function. They used ResNet-50 architecture and adopted the transfer learning technique. In this work, the SVM classifier was used as the top layer of ResNet-50. This method could also locate the spliced region based on an RGB-gradient map and then convert it into HSV color space. They explored three splicing datasets: DSO [4, 24], DSI [4], and Columbia dataset. This method scored a classification accuracy of 96% for splicing detection. 3.4 Image Splicing Detection and Localization Based on Local Feature Descriptor Using Deep CNN Rao et al. [12], have introduced a local feature descriptor-based method to detect and localize the area of image splicing with the help of Deep CNN. The first convolutional layer of the developed CNN was initialized with 30 optimized linear high-pass filters. First, the proposed CNN was pre-trained with labeled patch samples, which were taken from the training dataset. The input image was segmented into small patch size blocks, and then the local feature descriptor was applied to each small image block. Then by applying feature fusion, all those features were integrated to form the final discriminate feature vector. The Block pooling technique was also used for obtaining final feature sets. The final feature vector had been applied to an SVM classifier, and after that, this model was mixed with a fully connected model Conditional Random Field (CRF) for splicing localization purposes. The authors conducted their experiments on DSO-1 [4, 24], Columbia gray DVMM [12, 22] and CASIA v 2.0 [3, 7, 12]. They received an accuracy of 97% and also scored an AUC score of 99%.
4 Datasets A brief description of some of the most popular datasets which were explored in different research works for splicing detection purposes is given below. 4.1 CASIA Version 1.0 It [3, 7] contains 1721 color images, out of which 800 images are authentic, and the rest 921 images are forged. Out of 921 tampered images, 462 images are spliced, and 459 images are copy-move forged. The image format is JPEG or TIFF, and the image dimension is 384 × 256 or 256 × 384. 4.2 CASIA Version 2.0 It [7, 12, 23] contains 7491 authentic images and 5123 tampered images. All the images are in RGB format, and the tampered images consist of both spliced and copy-move forged samples. Among the tampered images, 3295 are copy-move forged, and 1828 are spliced images. The tampered images are post-processed to make it difficult for the forensic algorithms to detect. Image dimension ranges between 240 × 160 to 900 × 600.
228
D. Das and R. Naskar
4.3 Columbia Image Splicing Detection Evaluation Dataset This dataset [22] is composed of total 1845 numbers of grayscale images in BMP format with an image dimension of 128 × 128. Out of these images, 933 images are authentic, and 912 images are spliced. Each of these two types is further subdivided into five subcategories; those are – homogeneous textured region, homogeneous smooth region, object boundary between textured and smooth region, the boundary between two textured regions, and the boundary between two smooth regions. 4.4 DSO-1 and DSI-1 DSO-1 dataset [4, 24] consists of 200 outdoor and indoor images, out of which 100 are authentic, and 100 are spliced images with image dimension 2048 × 1536, and the image format is JPEG or TIFF. The set of tampered images is created by adding single or multiple persons in each source image that already contains at least a person. Then postprocessing is also applied to them. DSI-1 [4] consists of 50 images, out of which 25 are authentic, and 25 are spliced images with different resolutions. The image format is PNG. 4.5 Coco It [20] is a vast dataset consisting of nearly 3,30,000 high-quality images, primarily used for neural network training purposes for object detection, splicing detection, computer vision, or segmentation purposes.
5 Performance Metrics This section presents a brief description of different performance evaluation metrics [25]. The performance of a classifier model is represented by Confusion Matrix, a 2 × 2 matrix that describes the performance of a classifier model. It consists of 4 fields, which are True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). We can compute the following performance metrics from the confusion matrix. 5.1 Accuracy It measures the overall correctness of the model, i.e., how often it is correct in predicting positive or negative. It is computed as shown in Eq. 4. Accuracy =
TP + TN TP + TN + FP + FN
(4)
5.2 Precision It specifies how often a sample is positive when the model predicts it as positive. It is the ratio of true positive to all predicted positives. The Precision is computed as shown in Eq. 5. Precision =
TP TP + FP
(5)
Image Splicing Detection Using Feature Based Machine Learning
229
5.3 Recall or Sensitivity It is the ratio of true positives to all actual positives. It tells us how often the prediction is correct when the samples are positive. It is computed as shown in Eq. 6. Recall =
TP TP + FN
(6)
5.4 ROC-AUC Score A Receiver Operating Characteristic (ROC) Curve [25] is a graphical plot that describes the performance of a binary classification model at various classification threshold values. A ROC curve is plotted with two parameters: True Positive Rate (TPR) is plotted along the vertical axis, and False Positive Rate (FPR) is plotted along the horizontal axis. The AUC score measures the total 2-dimensional area under the ROC curve from (0,0) to (1,1). 5.5 F-1 Score F-1 Score is the harmonic mean of precision and sensitivity metrics. It ranges between 0 to 1, and it is a better evaluation metric than accuracy. It is calculated as shown in Eq. 7. F − 1 Score = 2 ∗
Precision ∗ Recall Precision + Recall
(7)
6 Comparative Study A comparative study among all the proposed schemes, which is discussed above, is given in Table 1. Table 1. Comparative study of discussed proposed schemes tables. Proposed Scheme
Basis of Operation
Datasets Explored
Experimental Result
Markov Feature-based Coefficient-wise and [5] block-wise Markov features in DCT
CASIA TIDE v2.0
Accuracy received 96.9%
Hybrid Feature Set [3] Hybrid feature set composed of HOG, DWT, LTE and LBP
CASIA v1.0, CASIA v2.0 and Columbia
Accuracy 98.3%, 99.5% and 98.8%
SVD and DCT-based [6]
Columbia
98.78% accuracy on SVD-DCT
SVD, DCT and Kernel-PCA
(continued)
230
D. Das and R. Naskar Table 1. (continued)
Proposed Scheme
Basis of Operation
Datasets Explored
Experimental Result
Markov features in QDCT [7]
Expanded Markov features in QDCT
CASIA V1.0, CASIA V2.0
Accuracy 92.38%
Based on LBP and DCT [8]
LBP, 2D-DCT and chorma components
CASIA TIDE v1.0, Accuracy of 97%, and v2.0, Columbia 97.5% and 96.6%
Mask-RCNN and ResNet-Conv-based [9]
Mask-RCNN, COCO dataset ResNet-Conv, transfer learning
AUC score of 96.7%
Based on WPIC and CNN [10]
WPIC and CNN
NIST MFC18 and DSO-1
AUC score of 76%
Illuminant maps and CNN-based [4]
Illuminant maps, CNN, ResNet-50, RGB-gradient
DSO-1, DSI-1, Columbia
Accuracy of 96%
Local Feature Descriptor and Deep CNN-based [12]
Local Feature Descriptor, Deep CNN, CRF, feature fusion
DSO-1, Columbia gray DVMM and CASIA v 2.0
Accuracy of 97% and AUC score of 99%
7 Conclusıon and Future Scope This paper discusses different image splicing detection schemes, either based on feature extraction and machine learning or a deep convolutional neural network. While deep learning-based approaches usually provide better accuracy than feature extraction-based methods, they require vast training data and are comparatively expensive. Training of deep learning model is also very time-consuming and complex. Each detection approach has been explained in brief, along with the required formulation. Experimental results of each work are also discussed, along with the datasets explored. The future scope of research in this direction includes experiments on more datasets. Forged datasets having different sized color images that may have undergone various forms of postprocessing, such as sharpening, cropping, compression, blurring, noise reduction, etc., need to be explored. Also, we can further improve the performance of the splicing detection models with more fine-tuning to improve the detection accuracy further. We can use transfer learning of pre-trained models to improve the accuracy of the deep learning-based models. Acknowledgment. This work is partially funded by the Department of Science and Technology(DST), Govt. of India, Grant No: DST/ICPS/Cluster/CS Research/2018 (General), dated: 13.03.2019.
Image Splicing Detection Using Feature Based Machine Learning
231
References 1. Mahmood, T., Nawaz, T., Irtaza, A., Ashraf, R., Shah, M., Mahmood, M.T.: Copy-move forgery detection technique for forensic analysis in digital images. Math. Prob. Eng. 2016, 13 (2016). Article ID 8713202 2. Shahroudnejad, A., Rahmati, M.: Copy-move forgery detection in digital images using affineSIFT. In: 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS), pp. 1–5. IEEE (2016) 3. Jaiswal, A.K., Srivastava, R.: A technique for image splicing detection using hybrid feature set. Multimedia Tools and Applications 79(17–18), 11837–11860 (2020). https://doi.org/10. 1007/s11042-019-08480-6 4. Pomari, T., Ruppert, G., Rezende, E., Rocha, A., Carvalho, T.: Image splicing detection through illumination inconsistencies and deep learning. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3788–3792. IEEE (2018) 5. Pham, N.T., Lee, J.-W., Kwon, G.-R., Park, C.-S.: Efficient image splicing detection algorithm based on markov features. Multimedia Tools and Applications 78(9), 12405–12419 (2018). https://doi.org/10.1007/s11042-018-6792-9 6. Moghaddasi, Z., Jalab, H.A., Noor, R.M.: Image splicing detection using singular value decomposition. In: Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing, pp. 1–5 (2017) 7. Li, C., Ma, Q., Xiao, L., Li, M., Zhang, A.: Image splicing detection based on Markov features in QDCT domain. Neurocomputing 228, 29–36 (2017) 8. Alahmadi, A.A., Hussain, M., Aboalsamh, H., Muhammad, G., Bebis, G. Splicing image forgery detection based on DCT and local binary pattern. In: 2013 IEEE Global Conference on Signal and Information Processing, pp. 253–256. IEEE (2013) 9. Ahmed, B., Gulliver, T.A.: Image splicing detection using mask-RCNN, pp. 1–8. Signal, Image and Video Processing (2020) 10. Stanton, J., Hirakawa, K., McCloskey, S.: Detecting image forgery based on color phenomenology. In: CVPR Workshops, pp. 138–145 (2019) 11. Wu, Y., Abd-Almageed, W., Natarajan, P.: Deep matching and validation network: an endto-end solution to constrained image splicing localization and detection. In: Proceedings of the 25th ACM international conference on Multimedia, pp. 1480–1502 (2017) 12. Rao, Y., Ni, J., Zhao, H.: Deep learning local descriptor for image splicing detection and localization. IEEE Access 8, 25611–25625 (2020) 13. Marcano-Cedeño, A., Quintanilla-Domínguez, J., Cortina-Januchs, M.G., Andina, D.: Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network. In: IECON 2010–36th annual conference on IEEE industrial electronics society, pp. 2845–2850. IEEE (2010) 14. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987) 15. Xiao, B.: Principal component analysis for feature extraction of image sequence. In: 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering, vol. 1, pp. 250–253. IEEE (2010) 16. Rinky, B.P., Mondal, P., Manikantan, K., Ramachandran, S.: DWT based feature extraction using edge tracked scale normalization for enhanced face recognition. Procedia Technol. 6, 344–353 (2012) 17. Porwik, P., Lisowska, A.: The Haar-wavelet transform in digital image processing: its status and achievements. Machine graphics and vision 13(1/2), 79–98 (2004) 18. Sharif, I., Khare, S.: Comparative analysis of haar and daubechies wavelet for hyper spectral image classification. Int. Archi. Photog. Rem. Sens. Spat. Info. Sci. 40(8), 937 (2014)
232
D. Das and R. Naskar
19. Hadi, S.J., Tombul, M.: Streamflow forecasting using four wavelet transformation combinations approaches with data-driven models: a comparative study. Water Resour. Manage 32(14), 4661–4679 (2018) 20. Lin, T.Y., et al.: Microsoft coco: common objects in context. In: European conference on computer vision, pp. 740–755. Springer, Cham (2014) 21. Guo, Z., Zhang, L., Zhang, D.: A completed modeling of local binary pattern operator for texture classification. IEEE Trans. Image Process. 19(6), 1657–1663 (2010) 22. Ng, T.T., Hsu, J., Chang, S.F.: Columbia image splicing detection evaluation dataset. DVMM lab. Columbia Univ CalPhotos Digit Libr. (2009). https://www.ee.columbia.edu/ln/dvmm/ downloads/AuthSplicedDataSet/dlform.html 23. Dong, J., Wang, W., Tan, T. Casia image tampering detection evaluation database. In: 2013 IEEE China Summit and International Conference on Signal and Information Processing, pp. 422–426. IEEE (2013) 24. De Carvalho, T.J., Riess, C., Angelopoulou, E., Pedrini, H., de Rezende Rocha, A.: Exposing digital image forgeries by illumination color classification. IEEE Trans. Info. Foren. Secur. 8(7), 1182–1194 (2013) 25. Seliya, N., Khoshgoftaar, T. M., Van Hulse, J.: A study on the relationships of classifier performance metrics. In: 2009 21st IEEE international conference on tools with artificial intelligence, pp. 59–66. IEEE (2009)
Audio Driven Artificial Video Face Synthesis Using GAN and Machine Learning Approaches Arnab Kumar Das(B) and Ruchira Naskar Indian Institute of Engineering Science and TechnologyShibpur, Department of Information Technology, Howrah, West Bengal 711103, India [email protected], [email protected]
Abstract. Now-a-days a large number of people share their opinion in either audio or video format through internet. Some of them are real videos and some are fake. So, we need to find out the differences between these two with the help of digital forensics. In this paper, the authors will discuss the different types of artificial face synthesis methods and after that authors will analyze the deepfake videos using machine learning methods. In artificial face synthesis, based on an incoming audio stream in any language, a face image or source video of a single person is animated with full lip synchronization and synthesized expression. For full lip synchronization, GAN can also be used to train the generative models. Keywords: Digital forensics · Artificial · Deepfake videos · Machine learning · Lip synchronization · Generative adversarial network
1 Introduction In recent times, video forgery detection is a major problem in video forensics. With the advancement in new technology and widespread availability of different software, video forgery and video tampering are common. Rapid increases in software modifications have made it possible not only to produce convincing forged or tampered video but also artificial face video synthesize nowadays. Artificial video synthesis is a part of deepfakes videos. Artificial video synthesis is a process of making video by changing the face of the actual person with someone else using the latest technology. The deepfakes videos have garnered worldwide attention for their application in making fake news, financial fraud, celebrity pornographic videos, and so on. In the process of making the deepfakes videos, the images or the videos of the target person are morphed and merged at first. After that, voice overlapping and lip synchronizations are done. As a result of this, the viewer thinks that it is a genuine video of the target person. Some of the works also incorporate in the artificial video synthesis of facial expressions along with lip movements to make animated faces look more natural [1, 2]. 1.1 Overview of Artificial Video Synthesis and Its Detection The popularity of artificial video synthesis is increasing day by day in researchers and industrial communities for its wide range of applications. Many artificial neural networks © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 233–244, 2022. https://doi.org/10.1007/978-981-19-3089-8_23
234
A. Kumar Das and R. Naskar
are used in the video synthesis process. Early efforts on video synthesis typically convert into low-resolution short video frames via a recurrent neural network [3] which is a part of the artificial neural network. With the help of the artificial video synthesis method, it is possible to generate a customized dancing video of a subject not having any dancing idea also. Many kinds of research are still going on video-to-video synthesis [4] methods. For the pose transferring from the source video to the target video, we have to identify or estimate the pose first then we can transfer it to another. Open Pose skeleton estimators [5] are utilized for extracting key points of human pose and represent them as multichannel label maps to adapt Supervised Video-to-Video Synthesis [6] model. In Fig. 1 the authors describe how artificial video synthesis works in a flow diagram.
Fig. 1. Work flow diagram of artificial video synthesis
Now the video synthesis can be made from any kind of input like any image or video. Generating the video in the artificial video synthesis method is a most challenging task because that is not only the video generation but also synthesizes and constructs a complex scene with multiple agents. 1.2 State-of-the-Art Video Synthesis Tools Different types of datasets are used to find the optimal outcome in video synthesis. Authors use different types of datasets for facial movement analysis and model training. The datasets used in the above-mentioned activities are not used for full-body analysis, pose detection, and video synthesis. For this kind of activity, another set of datasets is used. In Supervised Video-to-Video Synthesis [6], authors use the dataset consisting of ballet, jazz, street dance videos and also collect a high-quality indoor person video
Audio Driven Artificial Video Face Synthesis Using GAN and Machine
235
dataset, which has 1920 x 1080 resolutions. Vougioukas et al. [7] proposed a deep learning architecture for speech-driven artificial face synthesis using GAN that takes into account the facial expression dynamics in the course of speech. Zhou et al. [8] also proposed a novel framework called Disentangled Audio-Visual System (DAVS), which can highlight the main two aspects of video synthesis. One is identity preserving and another is lip synchronization quality enhancing. So for maintaining the above two aspects, the authors use the two different datasets to train the model. Song et al. [9] created a framework over the state-of-the-art in terms of visual quality, lip-sync, and smooth transition. Authors are also applied the conditional recurrent generation network which is applicable for both image and video. Another challenging task is to talk about face generation because that task creates so many problems for visual dynamics (e.g., camera angles, camera shaking, head movements). That is why Chen et al. [10] proposed a novel GAN structure, which consists of a multi-modal convolutional -RNNbased (MMCRNN) generator and a novel regression-based discriminator structure. This model also uses LSTM network for audio input. In this paper, the authors also focus on how the video synthesis tools work when audio, images, and videos are taken as input.
2 Related Literature In this section, the authors discuss the affiliated work of each technique. Artificial video synthesis is done in various ways. One is face synthesis with lip movement and another is a full-body synthesis with posture analysis. Many relevant works have already been done to generate realistic facial animation [11–13] from various audio, images, and video. But in this paper, the authors did not mention the human expression and emotion part when the person delivers the speech. Now the recent research works have carried out the artificial audio or speech-driven synthesis with proper lip movements [14] along with the face synthesis. Related work from earlier approaches has carried some personspecific facial video synthesis from audio [15]. In the early decade Ofli et al. [16] also proposed audio-driven full human body analysis and synthesis using HMM (Hidden Markov Model) but their training set consists of only two kinds of dancing videos. Some of the approaches also add the text to the speech synthesizer [17] to have a full text to video conversion. In ObamaNet [18], the visibility of teeth is showing awkward while the obama is speaking something. In other hand, the visibility of teeth is showing much better in work done by Yu et al. [19]. 2.1 Detection Techniques - An Overview The discovery of fake news and fake information comes with a challenge for the researcher because this type of information mainly comes from the internet. This type of fake information may be able to mislead a person or can change a person’s aim or opinion related to the particular topic. Due to the rapid increase of technology, deep fake video and artificial video synthesis are increasing day by day. This type of artificial video synthesis can be manipulated with the help of deep learning methods. It seems like actual videos. So, the researchers need to identify this type of video. Li et al. [20] created a framework to detect the deepfake video by measuring the eye blinking rate
236
A. Kumar Das and R. Naskar
using recurrent convolutional networks, LSTM network. The different types of CNN based methods [21] are used for video forgery detection. In Table 1 the authors discuss about the different types of artificial synthesis approaches for fake detection. Table 1. Summary of different types of approaches for fake detection Sl no Works
Contribution
Dataset
1
Feng et al. [22]
Use feature extraction Celeb-df module and classification module • For feature extraction the authors used the convolutional neural network structure (such as VGGnet Googlenet ResNet) and Euclid distance • For classification authors used the Support vector machines (Svms). Svms is a mainly supervised learning method
2
Ivanov et al. [23] There are 4 stages of UADFV this approach • Used face recognition module • Preprocessed frames are going through ResNet and super resolution algorithm for better classification • Head pose estimator and decision maker
Future Scope/Limitation • Need to be developing the eye recognition technology with different angles • Not applicable for large video
In compare to head direction vectors the classifier have not give the good result
(continued)
Audio Driven Artificial Video Face Synthesis Using GAN and Machine
237
Table 1. (continued) Sl no Works
Contribution
Dataset
Future Scope/Limitation
3
Nasar et al. [24]
The system has four CelebA modules • Data preparation module:- This module takes input in the form of audio, video or images • Image enhancement module:- Images gets enhanced by removing the noise • CNN model generation:- This model is basically used the CNN for the two phases training and testing • Testing Phase:- Test that the input is genuine or not
4
Pan et al. [25]
• First step to take the FaceForensics++ In future, new kind of input video and detention methods like divided it to frames eye blinking, eye color, (30 frames per special identification second) marks on the face etc. • Second step was to can be used for person detect and used the identification cascade classifier to detect the exact area of the faces • Third step to detect and testing the frames using deep learning(Xception and MobileNet) models
• The system only tested with one dataset • Lack of high quality original data for training purposes
238
A. Kumar Das and R. Naskar
3 Artıfıcıal Vıdeo Synthesıs - Approaches, Tools and Technıques In this work we shall also discuss different types of existing approaches, tools, and techniques for artificial video synthesis. There are some advantages and limitations also in artificial video face synthesis. Advantages:• It is useful for the media purpose.(eg. Advertisements, dummy fight scene). • Artificial video face synthesis is also very useful to news reading purpose when the newsreader is not present at that moment. • With the help of artificial video face synthesis the researcher will tells something from the person who are not present in the world. Limitation:• Very hard to produce the realistic synthesize videos. • Lack of training sample and as well as short of target images as an input. 3.1 Facial Expression Analysis and Capturing Emotions While creating the facial video synthesis, the expression analysis of facial movement is also a very important task. When a person speaks something then the emotion of the speaker is shown in his/her face. The authors detect the emotion and use it in the target image or video’s face. CNN Architecture has three parts one is 2D CNN where the 2-dimensional kernels are used for convolution and 3D CNN also use the same but in this approach, the authors use the 3-dimensional kernels. Another approach is bi stream CNN, where 2D CNN is used to extract special features from different type of video frames and then flow, which is computed from adjacent video frames via temporal CNN. In Recurrent Neural Networks [26], the motion features are captured from the input video frames in a short time interval. Recurrent Neural Networks such as LSTM (Long short term memory) also give a good result on audio and text analysis where the incoming information is also critical. There are basically seven types of facial expressions created by one person. Seven facial expressions classes names are calm, happy, sad, angry, fear, surprise, and disgust. The researchers have to identify these types of emotions to validate their system for visual facial expression analysis. To validate this, many researchers used the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset [27]. Nowadays Random Forest (RF) and K-Nearest Neighbor (KNN) classifiers are also used to classify the seven different types of emotion. Random Forest Classifiers [28] normally follow the tree-like structure where conditions can be used to separate the data. It also creates multiple decision trees and as a result of this, prediction power and accuracy are increased. K-Nearest Neighbor (KNN) classifies the two input datasets by finding the nearer K closest. Here the researcher uses the Euclidean distance to find the distance between one point to the other.
Audio Driven Artificial Video Face Synthesis Using GAN and Machine
239
3.2 Facial (Chin, Eye Blink, Teeth) Movement Analysis Facial landmarks detection and different types of facial movement analysis are the very common factor in the artificial video face synthesis method. In mouth shape features detection the face in each video frame is detected by dlib face detector which gives key points of the faces. Making the key points of any input video frames is compatible with any target output. While doing the facial video synthesis the head movement, lip synchronization, eyeblink generation, and chin movement are very crucial. So, there are different factors in facial motion analysis. Chin, upper lip, lower lip, lip corner, and cheek movement are part of these factors. The authors need to find out the correlation and coefficients between measured and recovered facial motion. A more careful examination indicates that the neural networks perform better results in many cases [29]. Mainly the chin and the lower lip are well tracked whereas the upper lip is more difficult to track. If the distance between the chin and upper lip is minimal, it might denote the speaker is silent. Sinha et al. [12] induced the landmarks generation that defines the speech along with facial motion and realistic eye blinks generation. The authors used the MMD loss minimization technique for learning the eye blinks on facial landmarks. Different types of words were used to analyze eye movement [30] and blinking [31]. Eye blinking is a simple physiological signal and it is not present in the fake videos. So some of the researchers detect the eye blinking in the videos to expose that video is genuine or fake [32]. Teeth movement analysis and identifying the dimples around the mouth and chin, are very challenging tasks. In some low-resolution input videos, mouth and teeth often appear blurry. 3.3 GANs in Artificial Video Synthesis A Generative Adversarial Network (GAN) is a proper subset of machine learning algorithms. It has two parts one is generator and another is discriminator. With the help of a given training set, this technique generates new data with the same statistics, characteristics, and behavior. It can be exploited to create entirely new images, videos, and voices in a highly realistic format, and it also enhances the quality of the images or videos. Firstly the generator part in the GAN model took the random input and produced the synthesized output. The main goal of the generator is to create an output that should be identical to the real data. A discriminator is called the classification technique to check the loss and how much different the generator’s output with the targeted result. The loss function was given in Eq. (2). Class, text, or other conditions should be added in the conditional GAN in both generator and discriminator parts for better output results. For regularizations purpose, spectral normalization [33] is used in the discriminator part. Spectral normalizations divide the weight matrix (w) by its most significant eigenvalue. The first GAN framework was introduced by Goodfellow et al. [34]. It consists of two competing networks one is a generator (G) and another is a discriminator (D). The objective of the generator (G) is to synthesize the fake data and produce data that is the same as real. In another part, the discriminator (D) network goal is to distinguish between the real and generated output samples. Here X, Y represents the pixel coordinate. The
240
A. Kumar Das and R. Naskar
Cost function GAN can be represented in the Eq. (1) min max F(D, G) = EX ∼P(X ) log(1 − D(G(X ))) + EY ∼P(Y ) log D(Y ) G
D
(1)
Different types of loss function are there in GAN model. The adversarial loss for generator function is the basic loss in GAN. It can be represented in the Eq. (2). Lossadv_G = EX ∼P(X ) [(D(G(X )) − 1)2 ]
(2)
Other than this loss GAN also have detailed loss, style loss [35] etc. Mirza et al. 2014 [36] proposed new conditional GANs where the researcher specifies which digit the GAN should be created. Algorithms of GAN Model for Full Lip Synchronization is as Follows
Step1:- Take the input video and target output video. Step 2:- Target video is divided into target frames. Step3:- Used the generator part of the CGAN to convert the audio encoding from input video and convert to target image encoder from target frame. Step 4:- From the audio encoding and target image encoder the generator creates the target image with lip synchronization with the help of training data. Step 5:- Now the discriminator part will work on the input video and the target image with lip synchronization to predict the fake and real image with the given Eq. (2). Step 6:- Check the prediction, if it is not up to the mark then repeat step 3 to step 5. Figure 2 shows the block diagram of Conditional GAN model. The Condition Vector (C) is used in both Generator (G) and Discriminator (D) part of CGAN. Condition GAN used acts as a condition probability.
Fig. 2. Block diagram of CGAN model
Chung et al. [37] proposed a generator model framework that has a face encoder, audio encoder, and face decoder. In this approach, some reconstruction loss happened for a completely different pose as a result. This type of problem was overcome by Prajwal et al. [38] proposed model. Zhu et al. [39] who proposed the Cycle- consistent adversarial networks (CycleGAN) to perform image to image translation without any levels or paired input data.
Audio Driven Artificial Video Face Synthesis Using GAN and Machine
241
4 Results and Analysis In this work, the authors survey different types of dataset and their usages. For artificial face video synthesis with lip synchronization the researcher use GRID [40], LRS2 [41], and TCD-TIMIT [42] dataset to train the model. GRID dataset consists of high-quality audio and video recordings of 33 speakers and each speaker speak 1000 sentences. These Sentences are easy to understand, and with the help of these sentences, anyone can easily identify the actual matter. Lip Reading Sentences 2 (LRS2) dataset is one of the largest and easily accessible datasets for researchers. Here the size of each sentence is 100 characters and also contains 1000 speakers without speaker labels. The TCD-TIMIT dataset consists of 59 speakers and each speaker speaks 100 long sentences. LRW [43] is also an audiovisual database that contains 500 different words from every 1000 speakers. It also consists of training, validation, and test sets for training and testing the module. Some of the authors use the YouTube videos for their training purpose. Yu et al. [19] focused to generate videos of President Donald John Trump from his voice or video caption. In this case, the authors used the 20 weekly presidential addresses which are last about 3 min each.
Fig. 3. Result of the synthesized target video using CGAN where a is the input image and b is the output
In Fig. 3 the authors shows that result of the CGAN in the artificial video synthesis. RMSE, SSIM, PSNR, and LPIPS are the different types of metrics used for comparing the result of different approaches in artificial face synthesis and deepfakes videos. Root Mean Square Error (RMSE) is used to measure the error of the target output. The structural similarity index measure (SSIM) is a method for predicting the quality of the different types of pictures and videos. It also measures the similarity between the images and videos. Table 2 consists of different PSNR and SSIM scores calculated in various ways. The peak signal-to-noise ratio (PSNR) is a ratio between the power of a signal and the power of a corrupting noise present in the picture. Learned Perceptual Image Patch Similarity (LPIPS) [44] metric initialize the network from random Gaussian weights and train it completely. These metrics are also trained on the CNN model.
242
A. Kumar Das and R. Naskar Table 2. Summary of different types of approaches for fake detection
Method/Contribution Dataset By
PSNR (dB) SSIM
ObamaNet [18]
Different types of Dataset
22.659
.754
Vougioukas et al. [7]
TCD-TIMIT
24.243
.730
GRID
27.10
.818
TCD-TIMIT
26.153
.818
GRID
29.305
.878
Different types of dataset including trump videos 30.182 from the internet and FaceForensics dataset
.899
Sinha et al. [12] Yu et al. [19]
5 Conclusion and Future Scope In this work, the authors have already discussed the procedure of synthesizing photorealistic video portraits from an input audio or image and producing a target video. In this work, the authors survey the state-of-the-art of GAN and synthesize models applied to human video synthesis methods. The authors also conduct comprehensive ablation to study the contributions of the different types of video synthesis models, their features, and their outcome in detail. The other area is to detect the multiple objects from a single video and synthesize one person’s image to that video. The eye blinks generation and teeth movements are also important along with lip-synching. From the author’s perspective, emotion detection (happy, sad, angry, neutral, surprised) from one sentence and applying the emotion to different sentences could be adopted in near future. In the future perspective, the image quality and background of the target image need to be improved.
References 1. Brand, M.: Voice puppetry. In: Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pp. 21–28 (1999) 2. Li, Y., Shum, H.-Y.: Learning dynamic audio-visual mapping with input-output hidden Markov models. IEEE Trans. Multimedia 8(3), 542–549 (2006) 3. Vondrick Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. Advances in neural information processing systems 29 (2016) 4. Wang, T.C., et al.: Video-to-video synthesis. ˙In: Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), p. 11441156 (2018) 5. Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S.-E., Sheikh, Y.A.: OpenPose: Realtime multi-person 2D pose estimation using part afnity elds. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172186 (Jan. 2021). https://doi.org/10.1109/TPAMI.2019.2929257 6. Wang, H., et al. Supervised video-to-video synthesis for single human pose transfer. IEEE Access 9, 17544–17556 (2021) 7. Vougioukas, K., Petridis, S., Pantic, M.: Realistic speech-driven facial animation with gans. Int. J. Comput. Vision 128(5), 1398–1413 (2020)
Audio Driven Artificial Video Face Synthesis Using GAN and Machine
243
8. Zhou, H., Liu, Y., Liu, Z., Luo, P., Wang, X.: Talking face generation by adversarially disentangled audio-visual representation. Proceedings of the AAAI Conference on Artificial Intelligence 33, 9299–9306 (2019) 9. Song, Y., Zhu, J., Wang, X., Qi, H.: Talking face generation by conditional recurrent adversarial network. CoRR, abs/1804.04786 (2018) 10. Chen, L., Maddox, R.K., Duan, Z., Xu, C.: Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7832–7841 (2019) 11. Kefalas, T., Vougioukas, K., Panagakis, Y., Petridis, S., Kossaifi, J., Pantic, M.: Speechdriven facial animation using polynomial fusion of features. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2020) 12. Sinha, S., Biswas, S., Bhowmick, B., Identity-preserving realistic talking face generation. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE (2020) 13. Wen, X., Wang, M., Richardt, C., Chen, Z.Y., Hu, S.M.: Photorealistic audio-driven video portraits. IEEE Trans. Visuali. Comp. Grap. 26(12), 3457-3466 (2020) 14. Chen, L., Li, Z., Maddox, R.K., Duan, Z., Xu, C.: Lip movements generation at a glance. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) 15. Fan, B., Wang, L., Soong, F.K., Xie, L.: Photo-real talking head with deep bidirectional lstm. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4884–4888. IEEE (2015) 16. Ofli, F., et al.: Audio-driven human body motion analysis and synthesis. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2233–2236 (2008) 17. Zhang, S., Yuan, J., Liao, M., Zhang, L.: Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary. arXiv preprint arXiv:2104.14631 (2021) 18. Kumar, R., Sotelo, J., Kumar, K., de Brébisson, A., Bengio, Y.: ObamaNet: photo-realistic lip-sync from text. arXiv preprint arXiv:1801.01442 (2017) 19. Yu, L., Yu, J., Ling, Q.: Mining audio, text and visual information for talking face generation. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 787–795. IEEE (Nov 2019) 20. Li, Y., Chang, M., Lyu, S.: In Ictu oculi: exposing AI created fake videos by detecting eye blinking. ˙In: Proc. IEEE International Workshop on Information Forensics and Security, pp. 1–7 (2018) 21. Mitra, A., et al.: A novel machine learning based method for deepfake video detection in social media. In: 2020 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS). IEEE, pp 91-96 (2020) 22. Feng, K., Wu, J., Tian, M.: A detect method for deepfake video based on full face recognition. In: 2020 IEEE International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Vol. 1, pp. 1121–1125. IEEE (2020) 23. Ivanov, N.S., Arzhskov, A.V., Ivanenko, V.G. Combining deep learning and super-resolution algorithms for deep fake detection. In: 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), pp. 326–328. IEEE (2020) 24. Nasar, B.F., Sajini, T., Lason, E.R.: Deepfake detection in media files-audios, ımages and videos. In: 2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS), pp. 74–79. IEEE (2020) 25. Pan, D., et al.: Deepfake Detection through Deep Learning. In: 2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), pp. 134–143. IEEE (2020) 26. Singh, B., Marks, T.K., Jones, M., Tuzel, O., Shao, M.: A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1961–1970 (2016)
244
A. Kumar Das and R. Naskar
27. Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5), e0196391 (2018) 28. Wong, K.-W., Lam, K.-M., Siu, W.-C.: An efficient algorithm for human face detection and facial feature extraction under different conditions. Pattern Recognition 34(10), 1993–2004 (2001) 29. Yehia, H.C., Takaaki, K., Eric, V.-B.: Linking facial animation, head motion and speech acoustics. J. Phon. 30(3), 555–568 (2002) 30. Torricelli, D., Goffredo, M., Conforto, S., Schmid, M.: An adaptive blink detector to initialize and update a view-basedremote eye gaze tracking system in a natural scenario. Pattern Recogn. Lett. 30(12), 1144–1150 (2009) 31. Divjak, M., Bischof, H.: Eye blink based fatigue detection for prevention of computer vision syndrome. ˙In: MVA, pp. 350–353 (2009) 32. Li, Y., Chang, M.-C., Lyu, S.: In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In: 2018 IEEE ınternational workshop on ınformation forensics and security (WIFS). IEEE, pp. 1–7 (2018) 33. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018) 34. Goodfellow, I., et al.: Generative adversarial nets. Advances in neural information processing systems 27 (2014) 35. Wan, W., Yang, Y., Lee, H.J.: Generative adversarial learning for detail-preserving face sketch synthesis. Neurocomputing 438, 107–121 (2021) 36. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411. 1784. (2014) 37. Chung, J.S., Jamaludin, A., Zisserman, A.: You said that? arXiv preprint arXiv:1705.02966 (2017) 38. Prajwal, K.R. et al.: Towards automatic face-to-face translation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1428–1436 (2019) 39. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycleconsistent adversarial networks, ˙In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017) 40. Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. America 120(5), 2421–2424 (2006) 41. Chung, J.S., Senior, A., Vinyals, O., Zisserman, A.: Lip reading sentences in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3444–3453. IEEE (July 2017) 42. Harte, N., Gillen, E.: TCD-TIMIT: An audio-visual corpus of continuous speech. IEEE Trans. Multimedia 17(5), 603–615 (2015) 43. Chung, J.S., Zisserman, A.: Lip reading in the wild. In: Asian conference on computer vision, pp. 87–103. Springer, Cham (2016) 44. Zhang, R., et al.: The unreasonable effectiveness of deep features as a perceptual metric. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595 (2018)
Design of an Elevator Traffic System Using MATLAB Simulation Ibidun Christiana Obagbuwa(B) and Morapedi Tshepang Duncan Department of Computer Science and Information Technology, Sol Plaatje University, Kimberley, South Africa {ibidun.obagbuwa,201802331}@spu.ac.za
Abstract. This paper presents discrete event simulation of an elevator traffic system. A network of queues is made up of service center representing system resources and consumers representing users or transactions. The elevator traffic system can be recognized as a Queue problem, whereby one must wait to be delivered to their desired floor level, and that is done accordingly to their desired floor level. We built the multi-server queuing model of the Elevator Traffic System with an exponentially distributed inter-arrival time of the arrival process, an exponentially distributed inter-service time of the server and c servers, using the SimEvents Blocks from Simulink Matlab. The outputs of the simulation were analyzed to obtain the average waiting time. Keywords: Queue problem · Queue system · Elevator traffic system · Deterministic queue · Probabilistic queue · Modelling · Simulation · Matlab simulink
1 Introduction In this paper, the queue problem was investigated, how a queue problem can be modeled and simulated. The area of our research is modeling and simulation of Real-World Problems which focuses on how natural systems can represented using models and software in order to have good understanding and able to predict the systems’ behavior. This work was motivated by the waiting line for the elevators which have incurred since the Covid-19 pandemic started especially at work buildings of many floor levels which makes workers not opt for the stairs. The increased waiting lines are due to the social distancing of which only a few numbers of workers are allowed at a time. The objective of our research is to find out the optimum service rate and the number of elevators so that the waiting time in the queue and service costs are minimized. This system is modelled as a multi-server queuing system (M/M/c), where every lift is represented as a server. The M/M/c model of the elevator traffic system was built with an exponentially distributed inter-arrival time of the arrival process, an exponentially distributed inter-service time of the server and c servers, using the SimEvents Blocks from Simulink Matlab. This paper has the following structure: Sect. 2 present related works. Section 3 shows the methodology for the work, Sect. 4 presents the operational model, and Sect. 5 shows the conclusion. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 245–254, 2022. https://doi.org/10.1007/978-981-19-3089-8_24
246
I. C. Obagbuwa and M. T. Duncan
2 Literature Review Mojsky and Achimsky [1] conducted a study on the invention of an algorithm to simulate an M/M/n queuing system with infinite queues using the models of queuing theory of the Markov type. Their goal was to develop a method for obtaining statistical data when simulating a real-world queuing theory system [1]. Balaji et al. [2] conducted a study to introduce an essential aspect of frustrated consumers and desertion in telephone contact center queuing systems. To deliver better solutions in complicated circumstances such as contact centers, they employed Matlab-Simulink multi queuing structured models [2]. A study on the multi-server queue system at a major fuel station was conducted by Balaji [3]. The emphasis of the study was on a MATLAB-Simulink model of a petrol bunk. In the fuel pump station, the queueing theory was applied to forecast certain critical metrics such as average waiting time and queue length [3]. With the experiment test and outstanding results from simulation, their model allowed the prediction and behavior of the queue under various physical and time circumstances. It also specifies the queuing systems to reduce customer waiting time [3]. Jah and Shukla [4] conducted a study that aimed to model and improve the traffic control algorithms to better meet rising demand. They created the Fuzzy optimization with MATLAB, which is concerned with determining the values of input parameters in a complicated simulated system that results in the desired output [4]. Harahap et al. [5] researched queuing model for a road intersection with traffic lights so that the appropriate traffic light length could be determined depending on the vehicle’s arrival [5]. The SimEvents MATLAB-Simulink program was used to show the computation from the queue waiting time model that was constructed. Oumaima et al. [6] conducted a study whose goal was to find a way to alleviate the rising strain on transportation infrastructure. They used modern communication, information, computer, and control technologies in the transportation area to relieve traffic congestion, improve public safety, and improve the efficiency of the road network [6]. Research done by Ahmad et al. [7] focuses on the use of discrete event simulation as a means of modeling and studying elevator placement strategies. Witness simulation software was used as a test environment for model building, simulation, and some experiments [7]. In this paper, SimEvents Blocks from Simulink/MATLAB was utilized to simulate a multi-server elevator queuing system.
3 Methodology MATLAB 2021a software by The MathWorks Inc has been utilized to develop and build a queue system model using SimEvents Blocks in the MATLAB Simulink. We studied the SimEvents: Model and simulated discrete-event systems [8] for a better understanding of the Matlab Simulink of simulation events. The simulation steps in Fig. 1. was adopted for this work. Elevator traffic system is a queue problem, and the following subsections give descrption of queue problem.
Design of an Elevator Traffic System Using MATLAB Simulation
247
Fig. 1. Simulation steps [9]
3.1 Queue Problem A queue happens when there are more clients than representatives to serve them. This implies that clients ought to hold up for their turn. 3.1.1 Queuing Models There are two major types of queue models namely deterministic and probabilistic models. Deterministic Queue Model: a deterministic model is a finite type of model; the service time is fixed. The number of arrivals per unit of time or the time gap between arrivals is set.
248
I. C. Obagbuwa and M. T. Duncan
Probabilistic Queue Model: an infinite type of model, the service time is not fixed. The number of arrivals per unit time or the length of the interval between arrivals is stochastic. 3.1.2 Queuing System Components Queuing System Components include Arrival Process, Service Mechanism, Queue Discipline, and Servers. Arrival Process: is the calling population. It can be limited or unlimited. It describes how the customers arrive in a system. Service Mechanism: The number of servers, each with its queue, and the probability distribution of client service time are used to define it. Calling Population - the population from which customers/jobs originate. Queue System - specifies the number of queues. Figure 2 depicts a typical queue system with different components.
Fig. 2. A queue system
3.1.3 Some Queue Configurations Examples of queue configuration include One Queue, One Server; One Queue, Multiple Servers; Multiple Queues, One Server; Multiple Queues, Multiple Servers. 3.1.4 Measures of Performance for Queuing Systems The performance measures for queueing system are Delay in Queue, System Waiting Time, Total of Queuing Customers, and Total number of System Customers. Other performance measures include the likelihood that there may be a delay; the likelihood that the overall delay will exceed a certain threshold; the likelihood of all service facilities being idle; the overall facility’s projected idle time; and the likelihood of being turned away owing to a lack of waiting for space.
Design of an Elevator Traffic System Using MATLAB Simulation
249
3.1.5 General Notation for Queues Arrival, service, and queue discipline are the three characteristics that define queues. The General Notation is shown in the Eq. (1): ’[A/B/S] : {d/e/f}’
(1)
where: ‘A’ = Arrivals Probability distribution; ‘B’ = Departures Probability distribution; ‘s’ = Total Servers; ‘d’ = Queue Capacity; ‘e’ = Calling Population Size; ‘f’ = Queue Ordering. 3.1.6 Applications of Queuing in Daily Life In our day-to-day life queueing has a lot of applications which include Traffic System, Banking System, Toll Plaza System, Railway Station System, Computer System, Construction Management System, Health Care System, and Telecommunication System. 3.2 Elevator Queue System as a Queue Problem A queue system can be used in the elevators to assess the values of the waiting time and the queue length in the lobby. Random input, in the form of the distribution of lift users, drives the model. As a result, we think of the model as deterministic, but the simulation as stochastic, because we are more interested in how people use lifts than the lifts themselves. To track the performance of various elevator dispatching algorithms; to investigate techniques for reducing the average waiting time and average system time; and increasing the efficiency of the present operational system while keeping an appropriate degree of operating ease and comfort. 3.3 Assumptions of the Model The assumptions for this model are listed below: • What rationale should be applied to the Elevator’s movement? • How can I be sure that the elevator will respond to a passenger who is waiting on a specified floor? • How do I make sure that the elevator only transports passengers up to its maximum capacity? • How can I be confident that the Elevator will follow the FCFS (First Come, First Served) logic and govern the passengers? • How can I make sure the elevator stays on the floor while people are entering or exiting?
250
I. C. Obagbuwa and M. T. Duncan
The modelling procedures are improved by the capacity to abstract the fundamental characteristics of the issue, select, and alter the fundamental presumptions that characterize the framework, amplify, and progress the model until a usable estimation is gotten. Based on the assumptions of the model and abstraction of the basic features as shown in Fig. 3 the conceptual model of the elevator system was created.
Fig. 3. Conceptual model flow chart of elevator system
Design of an Elevator Traffic System Using MATLAB Simulation
251
4 Operational Model of the Elevator System In our research we used the MATLAB 2021a software to build our queue system model. Our model is a probabilistic queue model. The arrival process of our model is FIFO (First In First Out). For the service mechanism we have multiple servers (elevators) each with its own queue, therefore the number of queues is the same as the number of elevators. The population of workers is infinite. Our model queue configuration is of multiple queues, multiple servers. Data assumption was used as no dataset was found. The model was tested by specifying different number of passengers, servers, and time. Figure 4 illustrates the operational model of the elevator system. Most real systems create models that need to store and calculate large amounts of information. Therefore, the elevator model must be completed in a computer-recognizable format.
Fig. 4. MATLAB simulink simulation of the elevator system
To see the workspace duration based on the average waiting time and average queue length, we utilized MATLAB Simulink. The simulation outputs were analysed, the outputs of the first run and second run shown in the Figs. 5 and 6 were compared. The average waiting time shown in the Fig. 7 was obtained from the analysis. A perfect linear average waiting time shown in Fig. 7 was achieved.
252
I. C. Obagbuwa and M. T. Duncan
Fig. 5. Results of simulation run 1
Fig. 6. Results of simulation run 2
Design of an Elevator Traffic System Using MATLAB Simulation
253
Fig. 7. Linear (average waiting time run 1 vs run 2)
5 Conclusion Queue problems are applicable in almost all real-world applications. In this work, SimEvents Blocks from MATLAB Simulink were used for building an M/M/c model of the elevator traffic system with an exponentially distributed inter-arrival time of the arrival process, an exponentially distributed inter-service time of the server and c servers. The classification of various techniques to elevator dispatching into queueing theory has progressed. Simulations of exponentially distributed elevator performance revealed the number of elevators needed in various scenarios and the average waiting time was computed. The model was tested with data assumption as no dataset was found. In the future the model can be improved in a way that it allows to be tested with the relevant dataset. The model can also be modified and used on similar activities involving traffic, railways, barber shops, fast food restaurants, toll gates, etc.
References 1. Mojský, V., Achimský, K.: The use of Matlab in creating M/M/n/∞ queuing theory model. Transp. Commun. 8(2), 13–22 (2020)
254
I. C. Obagbuwa and M. T. Duncan
2. Balaji, N., Siva, E.P., Chandrasekaran, A.D., Tamilazhagan, V.: Optimal service using Matlab simulink controlled Queuing system at call centers. J. Phys.: Conf. Series 1000, 012167 (2018) 3. Balaji, N.: Optimal resource model using Matlab /Simulink controlled queuing system using multi-server at major fuel stations. Int. J. Pure Appl. Math. 113(12), 221–229 (2017) 4. Jah, M., Shukla, S.: Design of fuzzy logic traffic controller for isolated intersections with emergency vehicle priority system using MATLAB simulation. Int. J. Eng. Res. Appl. 4(6) (2014) 5. Harahap, E., Darmawan, D., Fajar, Y., Ceha, R., Rachmiatie, A.: Modeling and simulation of queue waiting time at traffic light intersection. J. Phys.: Conf. Series 1188, 012001 (2019) 6. Oumaima, E.J., Jale, B.O.,Véronique, V.: Queuing theory-based simulation model for vehicular mobility. In: ICC 2021 - IEEE International Conference on Communications, pp. 1–6. (2021) https://doi.org/10.1109/ICC42927.2021.9500977 7. Jamil, A., Muhammad, L., Petr, H.: Investigation of an elevator dispatcher system. MM Sci. J. 12, 2593–2600 (2018) 8. SimEvents: Model and simulate discrete-event systems 2018 MathWorks. https://www.mathwo rks.com/products/simevents.html. Accessed 20 Aug 2021 9. Banks, J., Carson, J.S., Nelson, B.L., Nicol, D.: Discrete-Event System Simulation, 5th edn. Prentice-Hall, Upper Saddle River, NJ (2010) http://www.bcnn.net
A Simple Strategy for Handling ‘NOT’ Can Improve the Performance of Sentiment Analysis Ranit Kumar Dey(B) and Asit Kumar Das Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, West Bengal, India [email protected], [email protected]
Abstract. A major subset of natural language processing at intersection of computer linguistics and text mining is sentiment analysis that makes an effort of encapsulating the broader emotion from the accessible public opinions. The negations have a crucial role in linguistics in context of sentiment analysis as negations have the ability to impact the polarity of the other textual constituents. In a sentence, presence of negation does not impact only the word after the negation but its scope can extend up to a series of words next to the negation depending upon certain criteria. Considering broader level of structural formation negations can exist primarily in two ways, namely syntactic negation and morphological negation. Some common methods followed same strategy for negation handling without taking into consideration the difference among various kinds of negations. This work suggests a technique of handling ‘NOT’ which is a specific case of syntactic negation. Different linguistic characteristics are taken into account during designing and developing the technique. First sentiment analysis is done using some common classification models on several datasets. Then the datasets are processed with the ‘NOT’ handling technique and after that the same classification models are employed again. Performance is compared in both the cases with respect to different performance metrics by making use of 10 fold cross validation to show the efficacy of the ‘NOT’ handling strategy. Keywords: Natural language processing · Sentiment analysis Syntactic negation · ‘NOT’ handling · Linguistics
1
·
Introduction
The new technological advancements have greatly impacted the way our society functions [11]. Internet has become the medium for E-Commerce business, different service booking, blogging, social connectivity and many other utility services. Social mediums namely Twitter, Facebook are being targeted for business promotion, spreading of political ideologies, marketing of different products and sharing of views [29]. The users and customers put their comments on those c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 255–267, 2022. https://doi.org/10.1007/978-981-19-3089-8_25
256
R. K. Dey and A. K. Das
places and categorize the services and the products either by star rating [5] or by thumbs down or up [10,28]. As a consequence these places have become very useful resources of views expressed by the consumers. Precisely exploited consumer generated contents provide a cost effective fruitful way for tackling typical business problems, monitoring user satisfaction and acquiring broad perception. Analysis of customer opinions helps the service providers to resolve the critical problems and improve the goods and services. Moreover mining the user opinions from the social media contents help in different other applications like customization of web advertisement [19], identification of promoters and detractors in case of electoral campaigns [8] etc. However, due to the vast quantity of items available on internet platforms and their constant expansion, analysing such massive amounts of data is a tough task. Considering this need, Sentiment Analysis (SA) has grown up as an emerging field of research with the target of effective analysis of user opinions by automatic extraction of information and categorization. SA which is a valuable subset of Natural Language Processing (NLP), is a broad area in the domain of Artificial Intelligence (AI) [3]. SA is also popularly referred to as opinion mining as it involves the process to determine the emotions or opinions hidden in the textual data [6]. SA asses the text data to classify them in positive, negative or neutral category and in addition it helps in quantification of degree of polarization [7,24,27]. SA tools are applied for different human languages for identification of sentiment [5,20]. SA is a broad research spectrum at the intersection of NLP, text mining and computer linguistics. The scientists have exploited the SA extensively within the recent time frame [9,23,26]. The negations play very important role in the linguistics as negations may affect the polarization of other constituents. Appearance of negation within a sentence can not only affect the word next to the negation, its range can be broadened to include a series of phrases followed by the negative term. Considering the sentence for example, “the desktop is not perfect but it functions properly”, the scope of the negation is restricted to next word after the negation. On the contrary in the sentence “the desktop is not working for a longer duration”, the scope of the negation is spread up to the end of sentence. These kind of examples reveal that the scope of a negation is not range bound, it fluctuates based on various linguistic aspects like punctuation marks, conjunctions, parts of speech tags (POS). Additionally whenever a negation term appears, it does not indicate that polarities of all other words present within the sentence will be reversed [17]. However majority of the existent sentiment determination systems use traditional methodologies like statically sized window strategy [15,31] and punctuation based strategy [24] for dealing with complex linguistic cases. In static window based strategies a fixed numbers of textual elements are negated which means static scoping of negation is used. In case of punctuation based strategies polarities of words between the negation term and the next punctuation are inverted. Negations may appear in different forms such as implicit, explicit (explicit clues like no, not etc.), diminishers and various linguistic patterns that are restrained. When highest level of structural formation is considered, negations can appear in the text in basically two forms,
Sentiment Analysis
257
morphological negations and syntactical negations [12]. In case of morphological negations specific structure is followed where the root word has to be modified with the help of either prefix like ‘un-’, ‘non-’ etc. or suffix like ‘-less’ etc. In case of syntactic negations explicit cues invert an individual word or a series of words in sequence. Some of the existent works [15,17] followed a single strategy in determining the scope of negation for different types of negations without considering the distinction among various negation types. This paper focuses on the handling of syntactic negation, more specifically on the handling of ‘NOT’ for betterment of result. Instead of using general techniques that handles all kinds of negations, a distinct approach is incorporated within the paper focusing on the particular case of ‘NOT’ handling by taking care of the effect of various linguistic components. The proposed technique is validated by experimental repercussion. The remaining portion of the paper is structured in following way. The remainder of the paper is organised in the following manner. The work associated with negation handling is discussed in Sect. 2. Section 3 outlines the suggested ‘NOT’ handling method as well as the various parameters that influence the algorithm layout. Section 4 describes the experimental setup and examines the results based on empirical observation. Finally, Sect. 5 draws the conclusion and talks about the future span of the paper briefly.
2
Related Work
Within this segment, the works that have been conducted related to the identification of scope and handling of negation while discovering the polarity of the sentence, are discussed. A broad sketch of the algorithms, improvements and practical utilization of SA are talked through [21]. It simultaneously briefly describes the transfer learning and resource building process in SA. Overall it makes an effort to present a broad picture of SA and its allied fields. In literature of sentiment analysis several methodologies have been put forward for identification and handling of negation scope during polarity categorization of text data. Most of the proposed methods are either grounded on static window or punctuation symbols. In earlier studies the polarity of complete sentence is reversed when a negation term appears within the sentence [25]. Some works used punctuation symbols for scope identification of negation where polarities of all the textual elements between the negation term and the next punctuation symbol were reversed [24]. On the contrary some of the studies is based on static window of terms following or around the negation. The very simplest way is reversing the polarity of the word directly after the negation term [14]. In [22], the negation term is searched through a window of fixed number of words (three or six) before an opinionated textual element and if the negation term exists within the window, then polarity of the terms within the window is reversed. In the similar way a window size of five is considered in [13,16] where polarities of five words after the negation term are reversed. In the presence of a conjunction in the phrase, however, the negation effect may end even after a single
258
R. K. Dey and A. K. Das
word after the negation term. Another important matter is that many of these methodologies only pondered words having specific types of POS like adjectives [16] or adjectives and adverbs [4] where only polarities of such kind of words are reversed. But other kinds of words like verbs can also have polarity and it has to be checked whether they are affected by the negation. A comparison of different methods is performed with various static window sizes in [15] and it shows that the methodologies which considered window size of two after the negation term performs the best. But when considering the comprehensive performance these methodologies don’t perform satisfactorily. So use of only static window strategy is not sufficient to cope up with complex situations in natural language processing. The strategies may be appropriate for basic sentences, but they do not perform well in compound sentences. Moreover in some of the situations even the conjunctions or punctuation marks don’t put limit on the negation scope. Sometimes the negation term would not reverse the polarization like in case of using the negation “not only”. As a result negation handling strategy must identify these kind of circumstances such that polarization of only that words are reversed which are practically reversed by the negation term. A method for determining the scope and handling of negation is presented in [17] where three concepts of dynamic delimiters, static delimiters and heuristic rules formulated on polarities of words are combined for determination of scope of the negation. The clause where the negation term appears is found by the static delimiters and called the scope of the candidate that is then processed by heuristic rules and dynamic delimiters (depending upon the polarities of the words) to remove some of the words within the scope of the candidate. However, extension of scope can be done to more clauses in some of the situations. Another concern is that in some cases heuristic rules would not work properly and also require appropriate word disambiguation depending upon the context [30]. A survey is presented on role of negation for sentiment categorization in [30] where different issues related to the commonly applied methods are highlighted. Within this paper, we mainly present a technique to handle the ‘NOT’ which comes under the class of syntactic negation, while determining the polarity of text. The sentiment analysis results are significantly enhanced after implementing the approach, according to experimental observations utilising 10 fold cross validation where fitting procedure is carried out for ten times and each fitting is being implemented by training data that consists of 90% of the total data set which is selected randomly and the rest of the 10% is used for testing purpose.
3
Handling of ‘NOT’
Presence of negation within the sentence can have influence on sentiment orientation of several other words. Handling the negation means automatically determining the scope of it which is the phrase fragment impacted by negation and reversing the polarisation of the opinionated words within the scope influenced by negativity. Generally in case of simple sentences where the sentence consists of only one clause, the presence of negation might cause all
Sentiment Analysis
259
other words in the statement to reverse their polarity. When a sentence contains numerous clauses, it is called a compound sentence and there generally polarities of some of the words in the sentence are reversed and the positions at which polarities will be reversed depends upon different linguistic characteristics. As a consequence, it is relatively tricky task to handle the negation. In our work, light is spotted on handling a specific case of syntactic negation. Syntactic negation group consists of the negations that entirely reverses the polarities of the affected words. Syntactic negation is very common and well known negation class within user generated textual data. One of the very important simple syntactic negation is ‘NOT’ which is a constituent of many compound syntactic negations. For example when the negation “couldn’t” is expanded, it becomes “could not”. It shows the presence of ‘not’ in many compound syntactic negations. Similar examples of compound syntactic negations that consists of ‘NOT’ include “didn’t”, “wasn’t”, “weren’t”, “shouldn’t”, “don’t”, “haven’t”, “wouldn’t”, “doesn’t”, “hasn’t”, “haven’t”, “won’t”, “hadn’t”, “can’t”, “isn’t”, “shan’t”, “mightn’t”, “needn’t”, “mustn’t” etc. As a consequence proper handling of ‘NOT’ is very important in context of Sentiment Analysis. Our presented technique of handling ‘NOT’ is depicted in Algorithm 1. Different linguistic features which are considered in the algorithm are discussed below. 3.1
Presence of Conjunction
Conjunctions are the linking bridges among the clauses within the sentence. The most important conjunctions are ‘and’, ‘or’ and but. Presence of conjunction impacts the scope of negation in compound sentences. As for example consider the sentence “the webcam is not extraordinary but it is sufficient for making video call”. Here the presence of ‘not’ at the left side clause of the sentence reverses the polarization of only one opinionated term ‘extraordinary’. Here the presence of ‘but’ have not permitted the extension of scope of negation to the second clause within the sentence. However presence of ‘or’ and ‘and’ permits the extension of scope of negation to the clause afterwards. As for example consider the sentences “I have not cooked chicken or any other non-veg item” and “the tv screen is not clear and bright”. In both of the above examples extension of scope of negation after first clause is permitted. 3.2
Presence of Punctuation Marks
Punctuation marks like question mark (?), semicolon (;), comma (,), colon(:), full stop (.), exclamation mark (!), braces and double quotation and single quotation marks don’t permit the extension of scope of negation to the next clause. But in some cases the punctuation symbol comma shows some exception when it is employed either for ‘and’ or ‘or’ and the similar parts of speech is followed at both side.
260
3.3
R. K. Dey and A. K. Das
Dynamic Window
Instead of using static window of words where fixed number of words are considered for scope of negation, dynamic window size is implemented. The scope of negation is permitted to be extended until any limiting factor comes into the way like presence of punctuation marks or presence of conjunction ‘but’. 3.4
Presence of Diminishers
Diminishers that are also known as reducers, mainly diminish the polarization of the words that are affected by the negation instead of complete reversal of the polarity. The diminishers bring down the strength of negation. The most common diminishers include “less”, “hardly”, “rarely”, “little”, “seldom”, “scarcely”. 3.5
POS Tag
In case of sentiment determination adjectives, verbs and adverbs play most dominant role. As a consequence the tokens with aforesaid pos tags are taken in consideration minutely while handling the negation.
4
Experimental Analysis
At first we discuss the set up under which the experiments have been conducted and the different datasets that have been gathered from various resources for training and testing purposes. Then different classification models are introduced and the performance metrics that are being used for comparative study of the performance after employing the ‘NOT’ handling technique are depicted. Finally we show the improvement of classification results with respect to the performance metrics on the gathered datasets after applying the ‘NOT’ handling technique. 4.1
Dataset Gathering and Experiment Setup
For analysis of experimental results UCI Machine Learning Repository’s sentiment categorization dataset [1,18] have been utilized that comprises of Product reviews of Amazon (Amazon), Movie reviews of IMDB (IMDB), Restaurant reviews of Yelp (Yelp) where each consists of thousand instances of reviews. The experiments have been performed using HP PC of Pentium core i7 processor with 32 GB of RAM and NVIDIA GTX Getforce 1080 GPU. Python has been used as primary language for development and implementation of ‘NOT’ handling technique. After that Weka [2] toolkit is employed for classification models.
Sentiment Analysis
Algorithm 1: NOT HANDLING ALGORITHM Input: d (text) Output: dnp (text after processing with ’NOT’ handling technique) begin /*text is transformed to lower case*/ dl ← d.lower() /*tokenization of the text*/ T ← dl .split() /*slang substitution–‘ttyl’ is converted to “talk to you later”*/ keyss ← dict slang.keys() Tsl ← φ for each t ∈ T do if t ∈ keyss then /*Updation of token within the text*/ t ← dict slang.value(t) end Tsl ← Tsl .append(t) end /*apostrophe substitution–“couldn’t” is converted to “could not”*/ keysa ← dict apostrophe.keys() Tapsl ← φ for each t ∈ Tsl do if t ∈ keysa then /*Updation of token within the text*/ t ← dict slang.value(t) end Tapsl ← Tsl .append(t) end /*parts of speech tagging*/ pos list ← φ for each t ∈ Tapsl do pos list.append(t,pos.(t)) end /* ’NOT’ Handling strategy*/ len ← length(Tapsl ) for i ← 1 to len do if Tapsl [i] = not then j=len-i for k ← 1 to (j − 1) do if Tapsl [i + k] ∈ [ but , ; , . , : , ! , ( , ) , ? ] ∨ Tapsl [i + k] ∈ diminisher ∨ (Tapsl [i + k] = , ∧pos list[i + k − 1] = pos list[i + k + 1]) then break end if pos list[i + k] ∈ [ ADJ , ADV , V ERB ] then Tapsl [i + k] ← not + Tapsl [i + k] end end end end for i ← 1 to len do if countn ot(Tapsl [i])mod2 = 0 then Tapsl [i] = remove all not(Tapsl [i]) end else Tapsl [i] = keep one not(Tapsl [i]) end end return Tapsl end
261
262
4.2
R. K. Dey and A. K. Das
Classification Models
A short introduction of the classification models that are used for showing the effectiveness of ‘NOT’ handling technique are given below. 1. IB-k classification model is based on instance based strategy of k-nearest neighbors. In case of minimal availability of data distribution knowledge IB-k is one of the preferable option for performing categorization. 2. J48 is a decision tree classifier that uses entropy based information gain as the measure for splitting the data in subsequent stages. The attribute having the highest gain of information is picked for next splitting. 3. JRip is propositional rule based learner that reduces the error in incremental strategy using pruning. 4. Logistic classifier acts based upon the procedure of logistic regression and ultimately the posterior probability is assessed using the sigmoidal function. 5. NB classifier follows the Bayes principle with the presumption that each of the features of the objects are mutually independent. 6. PART is a rule based classifier that forms the rule base following the divide and conquer principle and the rule base is further used for categorization. 7. SMO stands for Sequential Minimal Optimization approach that is being employed for discovering the solution of the QPP (Quadratic Programming Problem) which comes into picture while the training of the Support Vector Machines (SVM) is carried out. 4.3
Performance Metrics
For analysis of performance four performance metrics have been considered, namely Accuracy, Precision, Recall and F-measure. Their mathematical expressions are depicted in Eqs. (1), (2), (3) and (4) respectively. Accuracy =
|TP | + |TN | |TP | + |FP | + |TN | + |FN |
(1)
|TP | |TP | + |FP |
(2)
P recision = Recall =
|TP | |TP | + |FN |
(3)
2 × P recision × Recall (4) P recision + Recall The collection of true positive examples is denoted by TP that means test instances which are classified as positive class and whose actual class is also positive. TN is the collection of true negative instances which means test instances which are classified as negative class and whose actual class is also negative. FP is the collection of false positive instances that indicates the test instances which are classified as positive class but their actual class is negative. FN is the collection of false negative instances which indicates the test instances which are classified as negative but their actual class is positive. F − measure =
Sentiment Analysis
4.4
263
Comparison of Performance
Firstly the sentiment categorization is done with previously discussed classification model. After that ‘NOT’ handling technique is first applied on the data and then sentiment analysis is performed with the same classification models and it is shown that overall the performance is improved after applying ‘NOT’ handling technique. Comparison of Accuracy. Accuracy specifies what proportion of test instances are classified perfectly. Comparison of accuracy is depicted in Table 1 where first accuracy is computed on the datasets by direct application of the classification models. Next data is processed by employing the ‘NOT’ handling technique and after processing the classification models are again applied. Then accuracy are tabulated and compared between these two cases. Table 1 shows improvement in accuracy for all the datasets by application of various classification models after implementing the ‘NOT’ handling technique. Table 1. Comparison of accuracy (%) Without handling NOT With handling of NOT Amazon IMDB Yelp Amazon IMDB Yelp IB-k
66.9
59.1
63.8
68.3
63.3
68.5
J48
67.3
60.7
66.6
70.0
62.0
67.2
Jrip
64.1
57.4
62.2
69.7
60.4
67.6
Logistic 78.2
77.6
76.0
78.9
79.4
77.8
NB
73.3
67.7
68.7
79.9
71.5
72.7
PART
68.8
61.8
64.8
72.5
62.7
66.7
SMO
79.7
75.5
77.8
81.8
79.1
80.4
Comparison of Precision. Precision specifies what fraction of positive class test instance identification is strictly accurate. Comparison of precision is depicted in Table 2 where first precision is computed on the datasets by directly employing the classification models. Next data is pre-processed with the ‘NOT’ handling technique and after processing of ‘NOT’ the classifiers are put in one more time. Then precision are tabulated and set side by side for these two instances. Table 2 proclaims improvement in precision on the benchmark datasets after application of classifiers when ‘NOT’ handling technique is employed in pre-processing of data. Comparison of Recall. Recall specifies what percentage of ground positives is recognized properly. Comparison of recall is depicted in Table 3 where firstly recall is calculated on the aforesaid datasets after applying the classifiers directly. Then processing of data is done using the ‘NOT’ handling technique and after
264
R. K. Dey and A. K. Das Table 2. Comparison of precision Without handling NOT With handling of NOT Amazon IMDB Yelp Amazon IMDB Yelp IB-k
0.673
0.593
0.639
0.683
0.633
0.686
J48
0.675
0.607
0.667
0.701
0.620
0.688
Jrip
0.642
0.576
0.623
0.692
0.600
0.670
Logistic 0.783
0.776
0.761
0.789
0.795
0.779
NB
0.733
0.678
0.688
0.798
0.715
0.727
PART
0.690
0.618
0.649
0.725
0.622
0.667
SMO
0.797
0.755
0.778
0.818
0.791
0.804
that the classifiers are put in again. Afterwards recall are tabulated and compared between these two cases. Table 3 exhibits enhancement in recall on aforesaid datasets for the classifiers after employing the ‘NOT’ handling technique. Table 3. Comparison of recall Without handling NOT With handling of NOT Amazon IMDB Yelp Amazon IMDB Yelp IB-k
0.669
0.591
0.639
0.719
0.636
0.688
J48
0.673
0.607
0.667
0.722
0.609
0.715
Jrip
0.641
0.574
0.622
0.753
0.631
0.686
Logistic 0.782
0.776
0.760
0.792
0.796
0.780
NB
0.733
0.677
0.688
0.799
0.716
0.745
PART
0.688
0.618
0.649
0.725
0.627
0.706
SMO
0.797
0.755
0.778
0.819
0.791
0.804
Comparison of F-Measure. The harmonic mean using recall and precision is employed to calculate the f-measure. Comparison of f-measure is depicted in Table 4 where first f-measure is computed on the datasets by applying the classification models directly. Next datasets are processed by employing the ‘NOT’ handling technique and after processing the classification models are applied again. Then f-measure values are tabulated and compared between these two cases. Table 4 shows improvement in f-measure for all the aforesaid datasets by application of various classification models after implementing the ‘NOT’ handling technique.
Sentiment Analysis
265
Table 4. Comparison of F-measure Without handling NOT With handling of NOT Amazon IMDB Yelp Amazon IMDB Yelp
5
IB-k
0.670
0.591
0.639
0.710
0.634
0.686
J48
0.673
0.607
0.667
0.720
0.612
0.697
Jrip
0.641
0.574
0.622
0.747
0.609
0.673
Logistic 0.782
0.776
0.760
0.791
0.795
0.779
NB
0.733
0.677
0.688
0.798
0.715
0.732
PART
0.688
0.618
0.649
0.725
0.624
0.680
SMO
0.797
0.755
0.778
0.818
0.791
0.804
Conclusion and Future Span
In this work a technique for handling ‘NOT’ is proposed which is a special case of syntactic negation. Different linguistic qualities have been taken into consideration while the ‘NOT’ handling algorithm is designed and developed. Experiments have been carried out using 10 fold cross validation where first sentiment analysis is done on different datasets by various classification models without handling the negation and in the next stage sentiment analysis is accomplished by the same classifiers after processing the datasets with ‘NOT’ handling technique. Then performance is compared for both the cases against different performance metrics which shows when ‘NOT’ handling technique is utilized then overall sentiment analysis performance of the classifiers improve on the datasets. As a consequence it is concluded that this ‘NOT’ handling technique adds value to the specific area of syntactic negation handling in context of sentiment analysis. In future this ‘NOT’ handling technique can be extended in more generalized way for handling the overall syntactic negation or in more broader technique of handling other different kind of negations such as morphological negations. This technique can also be tried out for both non-overlapping or overlapping multiclass sentiment categorization problem. Acknowledgment. University Grants Commission (UGC) of India is being acknowledged in this work for helping by issuing fellowship.
Compliance in Ethical Norms Conflict of Interest. The authors here proclaim that this manuscript doesn’t have any conflict of interest with other published resources and is not published earlier (partial way or in complete manner). None of the data is fabricated or modified for supporting the conclusive decisions.
266
R. K. Dey and A. K. Das
References 1. UCI machine learning repository: sentiment labelled sentences data set. https:// archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences. Accessed 15 Jan 2022 2. Weka 3 - data mining with open source machine learning software in java. https:// www.cs.waikato.ac.nz/ml/weka/. Accessed 15 Jan 2022 3. Baclic, O., et al.: Natural language processing (NLP) a subfield of artificial intelligence. CCDR 46(6), 1–10 (2020) 4. Benamara, F., Cesarano, C., Picariello, A., Recupero, D.R., Subrahmanian, V.S.: Sentiment analysis: adjectives and adverbs are better than adjectives alone. ICWSM 7, 203–206 (2007) 5. Boiy, E., Moens, M.F.: A machine learning approach to sentiment analysis in multilingual web texts. Inf. Retrieval 12(5), 526–558 (2009) 6. Cambria, E., Poria, S., Gelbukh, A.: IP nacional,“sentiment analysis is a big suitcase”. In: IEEE Intelligent System, pp. 74–80 (2017) 7. Chen, L.S., Liu, C.H., Chiu, H.J.: A neural network based approach for sentiment classification in the blogosphere. J. Informet. 5(2), 313–322 (2011) 8. Contractor, D., Faruquie, T.A.: Understanding election candidate approval ratings using social media data. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 189–190 (2013) 9. Davies, A., Ghahramani, Z.: Language-independent Bayesian sentiment mining of Twitter. In: The 5th SNA-KDD Workshop, vol. 11. Citeseer (2011) 10. Deng, Z.H., Luo, K.H., Yu, H.L.: A study of supervised term weighting scheme for sentiment analysis. Expert Syst. Appl. 41(7), 3506–3513 (2014) 11. DiMaggio, P., Hargittai, E., Neuman, W.R., Robinson, J.P.: Social implications of the internet. Annu. Rev. Sociol. 27, 307–336 (2001) 12. Giv´ on, T.: English Grammar: A Function-Based Introduction, vol. 2. John Benjamins Publishing (1993) 13. Grefenstette, G., Qu, Y., Shanahan, J.G., Evans, D.A.: Coupling niche browsers and affect analysis for an opinion mining application. In: Proceedings of Recherche d’Information Assist´ee par Ordinateur (RIAO) (2004) 14. Heerschop, B., van Iterson, P., Hogenboom, A., Frasincar, F., Kaymak, U.: Analyzing sentiment in a large set of web data while accounting for negation. In: Mugellini, E., Szczepaniak, P.S., Pettenati, M.C., Sokhn, M. (eds.) Advances in Intelligent Web Mastering – 3. Advances in Intelligent and Soft Computing, vol. 86, pp. 195–205. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-64218029-3 20 15. Hogenboom, A., Van Iterson, P., Heerschop, B., Frasincar, F., Kaymak, U.: Determining negation scope and strength in sentiment analysis. In: 2011 IEEE International Conference on Systems, Man, and Cybernetics, pp. 2589–2594. IEEE (2011) 16. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004) 17. Jia, L., Yu, C., Meng, W.: The effect of negation on sentiment analysis and retrieval effectiveness. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1827–1830 (2009) 18. Kotzias, D., Denil, M., De Freitas, N., Smyth, P.: From group to individual labels using deep features. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 597–606 (2015)
Sentiment Analysis
267
19. Langheinrich, M., Nakamura, A., Abe, N., Kamba, T., Koseki, Y.: Unintrusive customization techniques for web advertising. Comput. Netw. 31(11–16), 1259– 1272 (1999) 20. Mart´ın-Valdivia, M.T., Mart´ınez-C´ amara, E., Perea-Ortega, J.M., Ure˜ na-L´ opez, L.A.: Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst. Appl. 40(10), 3934–3942 (2013) 21. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014) 22. Narayanan, R., Liu, B., Choudhary, A.: Sentiment analysis of conditional sentences. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 180–189 (2009) 23. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Comput. Linguist. 35(2), 311–312 (2009) 24. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070 (2002) 25. Popescu, A.M., Etzioni, O.: Extracting product features and opinions from reviews. In: Kao, A., Poteet, S.R. (eds.) Natural Language Processing and Text Mining, pp. 9–28. Springer, London (2007). https://doi.org/10.1007/978-1-84628-754-1 2 26. Prabowo, R., Thelwall, M.: Sentiment analysis: a combined approach. J. Informet. 3(2), 143–157 (2009) 27. Saleh, M.R., Mart´ın-Valdivia, M.T., Montejo-R´ aez, A., Ure˜ na-L´ opez, L.: Experiments with SVM to classify opinions in different domains. Expert Syst. Appl. 38(12), 14799–14804 (2011) 28. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv preprint cs/0212032 (2002) 29. Wang, C., Zhang, P.: The evolution of social commerce: the people, management, technology, and information dimensions. Commun. Assoc. Inf. Syst. 31(1), 5 (2012) 30. Wiegand, M., Balahur, A., Roth, B., Klakow, D., Montoyo, A.: A survey on the role of negation in sentiment analysis. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 60–68 (2010) 31. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Comput. Linguist. 35(3), 399–433 (2009)
Rule Based Classification Using Particle Swarm Optimization for Heart Disease Prediction Udita Basu, Shraya Majumdar, Shreyasee Dutta, Soumyajit Mullick, Sagnik Ganguly, and Priyanka Das(B) Department of Computer Application, Institute of Engineering and Management, Kolkata, Kolkata 700091, West Bengal, India [email protected]
Abstract. Predicting heart disease has become a challenging task in the field of medical science. Everyday new technologies are coming up claiming accurate prediction of heart disease. Some computational approaches are working really well while some are not. This paper demonstrates that effective analysis can be done using a single objective particle swarm optimization technique based on the Cleveland Heart Disease dataset from UCI Machine Learning Repository. The proposed work has also been compared with some of the existing classification techniques. For the comparison task, we have used many popular classification models for predicting the rate of heart disease by analysing the heart disease data. The proposed work has been evaluated based on Classification Accuracy, Precision, Recall, F-measure and ROC curve. Keywords: Classification optimization
1
· Heart disease prediction · Particle swarm
Introduction
“Heart disease” refers to any problems affecting the heart. This covers a wide range of heart-related issues, many of which are life-threatening. If necessary precautions are not taken, this might become the major cause of death. According to the World Health Organization (WHO), cardiovascular illnesses cause 17.9 million deaths per year. High cholesterol, obesity, high triglyceride levels, hypertension, and other harmful activities all raise our chances of developing heart disease. Heart disease prognosis that is made at the right time can save lives. For estimating the rate of heart disease, scientists have developed a number of machine learning and data mining algorithms. Several studies dealing with the creation of effective prediction algorithms based on heart disease datasets have been published in the literature. One of them has developed a prediction method for diagnosing cardiovascular illnesses, as stated in [1]. Constricted Particle Swarm Optimization is a modified version of swarm optimization with a constriction factor that was employed in this study. Noncommunicable diseases c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 268–277, 2022. https://doi.org/10.1007/978-981-19-3089-8_26
Rule Based Classification Using Particle Swarm Optimization
269
such as diabetes and heart disease are the primary causes of death in children and adolescents. Proper identification, assuming the availability of a prediction system or other application programme, can help lessen this. The study detailed in [2] resulted in application software that doctors and other medical professionals can use to anticipate the onset of noncommunicable diseases. This application is based on patient records collected from the Bahrain Defense Force Hospital, and it assisted a physician in making quick and accurate decisions about a patient’s health. Another study, [3], developed a system for heart disease prediction using Na¨ıve Bayes (NB), Bayesian Optimized Support Vector Machine, K-Nearest Neighbors, and Salp Swarm Optimized Neural Networks while taking into account some of the primary risk variables. A comparison study [4] of 10 distinct feature selection strategies and six classification algorithms based on the Cleveland Heart Disease dataset is also available. The backward feature selection strategy combined with a decision tree classifier yielded the best results in this case. In [5], ensemble learning techniques were applied to predict heart disease. In this case, the Bagging technique combined with a decision tree yielded the greatest results. The accuracy of weak classification algorithms has been improved by combining many classifiers [6]. With the use of ensemble classification, this study claimed to obtain a maximum boost of 7% accuracy for poor classifiers. A feature selection implementation improved the process performance even more, and the results revealed a considerable increase in prediction accuracy. Using a feature selection strategy, the research effort presented in [7] provided a dimensionality reduction method and identified features of heart illness. The data for this study came from the Machine Learning Repository at the University of California, Irvine. For the classification of the Heart Disease dataset, a comparison study [8] of different classifiers was undertaken in order to correctly classify and predict heart disease cases with the fewest number of features. For this analysis, the dataset includes 76 features, including the class label, for 1025 patients from Cleveland, Hungary, Switzerland, and Long Beach. Another heart disease categorization utilising machine learning methods is reported in [9]. In addition, in [10] and [11], evolutionary algorithms were used to predict cardiac disease. The current study creates a particle swarm optimization technique for identifying heart disease data based on dominance relationships. It is mostly a singleobjective optimization technique that aids in the generation of classification rules from a dataset. The new unlabeled heart disease dataset can be tagged using the generated rules. In other words, we can forecast the risk of heart disease based on these classification principles. For this optimization strategy, the rule correctness has been used as the fitness function. This PSO technique based on dominance relationships, with rule correctness as the objective function, aids in information sharing among particles in the search space. For demonstrating the usefulness of particle swarm optimization for heart disease prediction, this suggested model is compared to some existing classification methods such as Decision Tree (DT), Random Forest (RF), K-Nearest Neighbour (KNN), Neural Network (NN), Support Vector Machine (SVM), and Logistic Regression (LR) model.
270
U. Basu et al.
The remainder of the paper is laid out as follows: The proposed methodology is discussed in Sect. 2. The experimental results are shown in Sect. 3, and the study is concluded with prospective future works in Sect. 4.
2
Proposed Methodology
To select a relevant subset of features, we have adopted the approach outlined in [12], which employs the notion of spanning tree. The most significant node in the spanning tree was eliminated iteratively until the modified network became empty. The chosen nodes were subsequently determined to be an essential feature subset that might be utilised to predict future heart disease. Let DS = (S, C, D) depicts the reduced heart disease dataset, with S representing the set of samples, C = C1 , C2 , · · · , Cn representing the set of n condition features picked by our applied feature selection algorithm [12], and D representing the disease type decision feature. Some missing values in the dataset are imputed using the method outlined in [13] and discretized using the Python-based Orange module [14]. Using Python’s Keras package, the classification model is trained independently for each illness type in a parallel processing environment. The initial population consists of a set of potential solutions chosen at random from the search space generated using the concepts presented in [15]. For a clear visualization of the proposed work a workflow diagram has been given in Fig. 1.
Fig. 1. Flowchart of the intended work
2.1
Initialization Steps of the Method
Every solution in the population is a particle that symbolises a rule that meddles through the search space in quest of the target function’s ideal values. In the search space, each particle maintains its position, velocity, and best position within the population. The swarm’s leadership status is also maintained. As a result, the PSO algorithm comprises three main steps: determining each particle’s fitness value, updating individual and global bests, and updating each particle’s velocity and location. This process was repeated until a halting condition was reached. At time t, the ci (t) indicates the velocity and ui (t) indicates
Rule Based Classification Using Particle Swarm Optimization
271
the position of the i-th particle. Again, when time has been increased to t + 1, the modified velocity is ci (t + 1) and modified position is ui (t + 1). Equation (1) points the rapidness of the particle at (t + 1). The inertia coefficient is h, acceleration coefficients are z1 , z2 (0 ≤ z1 , z2 , ≤ 2) and velocity coefficients are , which are ranmarked as l1 , l2 (0 ≤ l1 , l2 ≤ 1). Every velocity update uses ubest i dom numbers (t) represents the particle’s best individual position, and G(t) is the best position for swarm at time t. (t) − ui (t)] + z2 l2 [G(t) − ui (t)] mi (t + 1) = h.mi (t) + z1 l1 [ubest i
(1)
The particle’s position is updated every (t + 1) seconds using Eq. (2), where ui (t + 1) and mi (t + 1) are the particle’s new position and velocity, respectively, and ui (t) is the particle’s previous position. ui (t + 1) = ui (t) + mi (t + 1)
(2)
The ith particle bi from population B in this study is a vector of n dimension that denotes the preceding position as ci (0). The beginning position of the parti(0), G(0) swarm’s best position. cle determines the particle’s best position, cbest i Apart from that, the particle’s primary velocity is set to mi (0) = 1, 1, · · · , 1, which is the vector with all 1 elements. Following each repetition, the particle’s velocity changes, and the particle’s velocity bi is computed using Eq. (1) at the (t + 1)-th iteration. Equation (2) was used to determine the new position. 2.2
Fitness Function
Objective functions, especially rule correctness, are used to determine the fitness value of each particle. Rule accuracy differs from classification accuracy in that it is only based on datasets with feature values that fit the rule’s antecedent, rather than the entire training set. The frequency of a rule in the correct set as well as in the match set determines the accuracy. Let’s say the rule’s antecedent component R matches T examples of the training data, but only Q instances match the consequent element of the rule. After that, using the Eq. 3, the accuracy is calculated. Q (3) T To determine each particle’s position, optimal position, and initial velocity bi in B, the approach uses an initialization procedure, which is denoted by ci (0), (0), and mi (0), respectively. Furthermore, the primary global optimum locacbest i tion G(0) of the swarm is also measured by situation of the first particle. The initialization method takes as an input a search space S, initialises the particles, and returns the global best position G(0). From the final population, the first set of rules is recovered following Dominance-relationship based Particle Swarm Optimization (DPSO) convergence. Accuracy(A) =
272
2.3
U. Basu et al.
Dominance-Relationship Based Particle Swarm Optimization Based Classification
“Orange”, a Python-based open source data mining software, was used to create the model for heart disease data. We propose a DPSO classifier for analysing the Cleveland Heart Disease dataset in this paper. In this study, an effective classifier was developed to analyse the heart disease dataset and predict the likelihood of developing heart disease. We used a single objective swarm optimization approach to design a competent classifier for identifying the heart disease in this study. To begin creating the classifier, a n-dimensional search space was built. The initial candidate solution is defined as the initial population of particles. This method brings the candidate solutions in the search space closer to the global ideal point. For each generation, it chooses solutions which are not dominated to each other and have sound fitness values. Once the process has converged, the most comprehensive set of categorization standards has been selected from the final population. The fitness function for accuracy has been defined. The correctness of a rule is not the same as the accuracy of a classification. The rule’s accuracy is measured over the dataset with feature values that fit the rule’s antecedent, rather than across the entire training set. Rule accuracy is defined as the ratio of the number of times a rule appears in a correct set to the number of times it appears in a match set. At a single objective PSO technique, if the fitness of the particle in a new place is superior than its current status, the particle will change its position. The present placements of the particles, as well as their individual best positions and the swarm’s best positions, determine which particles will be in the population for the next iteration. Because it enhances information flow between particles, we took into account the parentoffspring dominance relationship [15]. We begin by calculating the fitness value, which is also known as the accuracy value. Only the KNN of each particle have been chosen to share their information with that particle in the case of dominating relationship based PSO. Let Ni1 , Ni2 , · · · , NiK ; ∀i = 1, 2, · · · , q be the KNN particles for each particle bi . So, bi shares its information with each particle of Nij ∀j = 1, 2, · · · , K. Assume a particle Bi with its neighbour particle Bj and assume Si (t) and Sj (t) are their respective current positions followed by Si (t+1) and Sj (t + 1) as their new positions, respectively. The fitness function i.e., the rule accuracy represented as Ai for all these four instances have been computed. Both the Si (t) and Si (t + 1) as well as Sj (t) and Sj (t + 1) are non-dominated to each other. The Si (t) and Si (t + 1) examples were deemed to be superior and were kept for the next generation as a consequence of the comparison. The new population will consist of one parent and one offspring instance. One parent and one offspring were chosen for the new population. We utilised Bi as one parent and Nij represent another parent ∀j = 1, 2, · · · , K, along with other two examples of superiority in this work. From the collection of (K + 1) particles, K particles were obtained after one iteration. The population’s particles deliver their best individual positions when a specific number of iterations have been completed.
Rule Based Classification Using Particle Swarm Optimization
273
Fig. 2. Relationships of dominance among four particles
Si (t) and Si (t + 1) are non-dominated to one other, as seen in Fig. 2. Furthermore, Sj (t) and Sj (t + 1) are non-dominated to one another. Both Si (t) and Si (t + 1) dominate Sj (t + 1), whereas Sj (t) is dominated by Si (t). When all four samples are compared, Si (t) and Si (t + 1) are the two best, and they should be kept for future generations. For the new population generation, a parent and a child are chosen. The suggested DPSO model is created using the Algorithm 1.
3
Experimental Results
This experiment was evaluated using two different ways. Before comparing the proposed classifier to existing classifiers, we evaluated it first. Existing classifiers that we used in our experimental study endeavour include KNN, LR, NN, RF, DT, SVM and recently published works such as [7,8], and [9]. We used 10 fold cross validation for the comparison job. Using the confusion matrix as presented in Table 1, the prediction model in the proposed study was assessed using four metrics [16,17] such as Classification Accuracy, Precision, Recall, and F1 score. When class labels are known, these metrics are employed as external cluster validation indices. Table 1 shows true positive (u), false negative (v), false positive (w), and true negative (x) scores. When we divide the number of accurately identified objects by the total number of items in a class, the score u . Recall for Precision metric has been obtained and it is written as Pr = u+w is calculated by dividing the number of objects correctly identified by the total u number of objects in the class. We can express the term Recall = Re = . u+v u+x . It is derived as Classification Accuracy is expressed as, Ac = u+v+w+x a result of dividing the total number of objects properly identified by all the objects present in the dataset. The F1 score is the harmonic mean and can be expressed in terms of both precision and recall. We have also compared our proposed work with three other existing classifiers as discussed in Classification models for heart disease prediction using feature selection and pca (FSPCA) [7], Prediction of heart disease and classifiers’ sensitivity analysis (HDCSA) [8] and Heart disease classification using machine learning techniques (HDCML) 2(Pr*Re) [9]. The derived formula of F1 value is F1 = . The results are shown Pr + Re in Table 2.
274
U. Basu et al.
Algorithm 1: DPSO (R11 , S, B) Input: R11 = samples with class ‘D1 = L1 ’, S=the search space and B = the initial population Output: B = Final Population that represents classification rules for disease type ‘D1 = L1 ’ begin Initialize the particles in the population; while t < a predefined threshold do for i = 1 to m do Compute new position ui (t + 1) of particle bi using Eq. (1) and Eq. (2); Compute fitness functions f1i (t) and f1i (t + 1) for ui (t) and ui (t + 1) using Eq. (3); end G(t + 1) = G(t); for i = 1 to m do Let Neigh (bi ) = {Ni1 , Ni2 , · · · , Nik }; Neigh (bi ) is not empty; Binew = φ; /*New particles generated from bi for the new population*/ T = 0; for each bj ∈ Neigh (bi ) do T = T + 1; Let Bij = {ui (t), ui (t + 1), uj (t), uj (t + 1)} /*t-th and (t + 1)-th instances of bi and bj */ Compute dominance relationship among particles in Bij based on their fitness value; i,1 = Set of two better instances in Bij ; BT i,1 Binew = Binew ∪ BT ; new [1]; /*the first particle in Binew */ Let bi = Bi Neigh (bi ) = Binew − {bi }; /*remaining particles in Binew */ end Let bi contains two new instances u1i and u2i ; if u1i dominates u2i w.r.t the fitness function then ui (t + 1) = u1i ; else if u2i dominates u1i w.r.t the fitness function then ui (t + 1) = u2i ; else ui (t + 1) = u1i or u2i , selected randomly; end end end end bi = ui (t + 1); /*new position of particle bi based on dominance relation*/ if ui (t + 1) dominates ui (t) then ubest (t + 1) = ui (t + 1) i else ubest (t + 1) = ui (t); i end end if ubest (t + 1) dominates G(t + 1) then i G(t + 1) = ubest (t + 1); i end end t = t + 1; end Return B; /*each particle in the population B is a rule*/ end
Rule Based Classification Using Particle Swarm Optimization
275
Table 1. Confusion matrix Actual
Predicted Positive Negative
Positive
u
v
Negative w
x
The current work’s performance was also assessed using the Receiver Operating Characteristics (ROC) curve. The Area Under Curve (AUC) is an important feature of this ROC curve (AUC) [18]. The AUC is a classification criterion, whereas the ROC is a probability curve. The True Positive Rate (on the Y-axis) is compared to the False Positive Rate (on the X-axis) in the ROC curve (in Xaxis). From Eq. (4) to (6), the terms used in the ROC curve are presented. The higher the area under the curve, the better the classification performance, i.e., when the AUC = 1, the classifier clearly separates all the Positive and Negative classes. When AUC = 0, the classifier predicts all Negatives to be Positives, and vice versa. The ROC curves for all of the classifiers are shown in Fig. 3. True Positive Rate/Sensitivity = Specificity =
u u+v
x x+w
False Positive Rate = 1 − Specificity =
(4) (5)
w x+w
(6)
Fig. 3. ROC curves for all the comparative classifiers
The suggested work beats all existing classifiers in terms of all assessment metrics, according to the results. For the comparative classifiers, the ROC curve has been shown. The Cleveland dataset’s Neural Network classifier also does a good job at predicting cardiac illnesses.
276
U. Basu et al. Table 2. Evaluation results based on different classifiers Technique
AUC Ac
DPSO
0.970 0.960 0.960 0.960 0.960
F1
Pr
Re
KNN
0.844 0.766 0.765 0.765 0.766
Logistic Regression 0.935 0.875 0.874 0.876 0.875
4
Neural Network
0.952 0.950 0.950 0.951 0.950
Random Forest
0.927 0.920 0.920 0.920 0.920
Decision Tree
0.900 0.910 0.910 0.910 0.910
SVM
0.947 0.908 0.907 0.909 0.908
FSPCA
0.845 0.851 0.852 0.852 0.852
HDCSA
0.823 0.834 0.833 0.832 0.834
HDCML
0.861 0.874 0.874 0.876 0.875
Conclusion and Future Plans
The major goal of this work is to create a dominance relationship-based optimization technique for predicting cardiac illnesses. The proposed model has a classification accuracy of 97%. Developing such an efficient classifier is critical as well as valuable for real-world situations such as detecting or categorising heart conditions using existing data patterns. It aids doctors and medical practitioners in taking preventative measures for heart disease patients. Comparative investigation using some of the existing classification algorithms has also yielded interesting findings. The dataset size can be expanded, and deep learning combined with other optimization approaches can produce more promising outcomes. More approaches to connect heart-disease-trained data mining algorithms with specific multimedia for the convenience of patients and clinicians may be discovered in the future. More disease datasets could be employed in the future for classification algorithms, and data mining techniques like clustering could be used to analyse the heart disease dataset. We can also improve our proposed work to handle imbalanced datasets, but that remains as a future work.
References 1. Suvarna, C., Sali, A., Salmani, S.: Efficient heart disease prediction system using optimization technique. In: 2017 International Conference on Computing Methodologies and Communication (ICCMC), pp. 374–379 (2017) 2. Aldallal, A., Al-Moosa, A.A.A.: Using data mining techniques to predict diabetes and heart diseases. In: 2018 4th International Conference on Frontiers of Signal Processing (ICFSP), pp. 150–154 (2018) 3. Patro, S.P., Nayak, G.S., Padhy, N.: Heart disease prediction by using novel optimization algorithm: a supervised learning prospective. Inform. Med. Unlock. 26, 100696 (2021)
Rule Based Classification Using Particle Swarm Optimization
277
4. Dissanayake, K., Gapar, M., Johar, M.: Comparative study on heart disease prediction using feature selection techniques on classification algorithms. Appl. Comput. Intell. Soft Comput. 2021, 1–17 (2021) 5. Gao, X.Y., Ali, A.A., Hassan, H.S., Anwar, E.M.: Improving the accuracy for analyzing heart diseases prediction based on the ensemble method. Complexity 2021, 1–10 (2021) 6. Latha, C.B.C., Jeeva, S.C.: Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform. Med. Unlock. 16, 100203 (2019) 7. G´ arate-Escamila, A.K., El Hassani, A.H., Andr`es, E.: Classification models for heart disease prediction using feature selection and PCA. Inform. Med. Unlock. 19, 100330 (2020) 8. Almustafa, K.M.: Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinform. 21(278), 1–18 (2020) 9. Radhika, R., George, S.T.: Heart disease classification using machine learning techniques. J. Phys: Conf. Ser. 1937(1), 012047 (2021) 10. Uyar, K., Ilhan, A.: Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks. Proc. Comput. Sci. 120, 588–593 (2017) 11. Aleem, A., Prateek, G., Kumar, N.: Improving heart disease prediction using feature selection through genetic algorithm. In: Woungang, I., Dhurandher, S.K., Pattanaik, K.K., Verma, A., Verma, P. (eds.) ANTIC 2021. Communications in Computer and Information Science, vol. 1534, pp. 765–776. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-96040-7 57 12. Das, P., Das, A.K., Nayak, J.: Feature selection generating directed rough-spanning tree for crime pattern analysis. Neural Comput. Appl. 32, 1–17 (2018) 13. Pati, S.K., Das, A.K.: Missing value estimation for microarray data through cluster analysis. Knowl. Inf. Syst. 52(3), 709–750 (2017) 14. Demˇsar, J., et al.: Orange: data mining toolbox in python. J. Mach. Learn. Res. 14, 2349–2353 (2013) 15. Das, P., Das, A.K., Nayak, J., Pelusi, D., Ding, W.: Incremental classifier in crime prediction using bi-objective particle swarm optimization. Inf. Sci. 562, 279–303 (2021) 16. Das, P., Das, A.K.: Rough set based incremental crime report labelling in dynamic environment. Appl. Soft Comput. 85, 105811 (2019) 17. Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997) 18. Das, P., Das, A.K., Nayak, J.: Feature selection generating directed rough-spanning tree for crime pattern analysis. Neural Comput. Appl. 32, 7623–7639 (2018)
A Deep Learning Based Approach to Measure Confidence for Virtual Interviews Ravi Kumar Rungta , Parth Jaiswal , and B. K. Tripathy(B) School of Information Technology and Engineering, VIT University, Vellore, Tamil Nadu 632014, India {ravikumar.rungta2019,parth.jaiswal2019}@vitstudent.ac.in, [email protected]
Abstract. Confidence means trust and a strong belief in something or someone. During an interview, demonstrating self-confidence, i.e., showing trust and faith in your own abilities can be the deciding factor for your success versus the other candidates. It is as important as showing your skills, as without self-confidence one cannot convince the interviewer to trust one’s skills, one’s ability to learn and the ability to get the work done. Demonstrating a positive body image and strong, confident personality can make a lasting impression on an interviewer, and in the future on other colleagues as well. In the age of coronavirus, the interviews have migrated to virtual platforms, so only a candidate’s face is used to gauge his/her self-confidence. Deep Learning can be helpful to gain best results on image classification for confidence detection. With the help of different types of facial expressions present in dataset we can classify and help in telling the measure of self-confidence. This work aims to develop a confident and unconfident image classifier which will help interviewees to practice confident expressions and improve their skills. The Deep Learning CNN model is trained with the help of two optimizers, namely, Adam and SGD. Keywords: Confidence · Deep learning · Classification · Convolutional neural network · Optimizers · Adam · SGD
1 Introduction In this age of Coronavirus, most of the day-to-day processes of the IT sector were moved to online mode. Education, jobs, and management etc. was moved to online mode and will continue to be online as long as coronavirus persists. In such a scenario, students are facing new challenges in finding jobs as their mode of instruction and examination has moved to online mode. With the placement season going on, we have observed that the system of online interview puts students out of their comfort zone as only their facial expressions, and not their whole body, has become the measure of their confidence and personality. It is our aim in this work is to address this problem by developing a solution which will help students to practice confident facial expressions and improve their chances of success in online virtual interviews. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 278–291, 2022. https://doi.org/10.1007/978-981-19-3089-8_27
A Deep Learning Based Approach to Measure Confidence
279
2 Literature Review The human brain learns and describes concepts by organizing them into hierarchies and using multiple layers of abstraction. In the process, the information is transformed and represented in various ways. The information is obtained through the 5 sensory organs, i.e., Vision, Smell, Touch, Hearing, and Taste. This process of working of human brain is what inspires Artificial Neural Networks (ANN) [17]. Inspired by the architectural depth of the brain, neural network researchers had wanted to train deep multi-layer neural networks. But the actual breakthrough came in the year 2006. Experimental results were fruitful for shallow networks with a few hidden layers [6]. Deep learning has extended the vision of AI [1, 3]. Also, it has other applications like in audio signal processing [4], healthcare [8] and image processing [2]. The concept of Deep belief networks (DBNs) introduced by Hinton et al. [19], where a learning algorithm that greedily trains one layer at a time following unsupervised learning for each layer known as Restricted Boltzmann Machine (RBM) [18]. Convolutional neural networks (CNN) is a deviation from supervised DNN [11]. It has several applications. Our work is based upon the work done in [15]. The paper uses various post processing and visualization techniques to evaluate the performance of the CNN used to recognize facial expressions. The research concludes that deep CNNs are able to learn facial characteristics and the emotions on the face. In this section we also present the various algorithms and methodologies used by the paper [15]. Simonyan et al. [13] have investigated the effect of CNN depth on its accuracy in the large-scale image recognition setting. The representation depth was found to be beneficial for the classification accuracy. Chatfield et al. [7] evaluated various CNN techniques and explored different deep architectures. Then, they compared them on a common ground, identifying and disclosing important implementation details. An ideal way of detecting and recognizing human face using OpenCV, and python is investigated in [16], which is part of deep learning. An approach which enables the prediction values for more complicated systems is discussed in [14]. But, the problem of classification occurs when the data are not represented in an equal manner (ratio is 4:1). The article [10] deals with the Facial Expression Recognition (FER) problem from a single face image. Alizadeh et al. [5] have proposed a CNN-based approach which consists of several different structured subnets. The whole network is structured by assembling the subnets together, in which each subnet is a compact CNN model. S. A. -P. Raja Sekaran et al. [12] proposed transfer learning of pre-trained AlexNet architecture for Facial Expression Recognition. Researchers perform full model fine tuning on the AlexNet, which was previously trained on the ImageNet dataset, using emotion datasets. Then the proposed model is trained and tested on extended Cohn-Kanade (CK+) dataset and FER dataset, which are two widely used facial expression datasets. Long et al. [9] proposed two different approaches on facial emotion recognition, one based on image visual features, and the other based on EEG signals. The EEG signals were recorded when the subject is watching facial emotion pictures. The Deep Convolutional Neural Network (DCNN) model is adopted to enable the machine to learn visual features from facial emotion pictures automatically.
280
R. K. Rungta et al.
Fig. 1. A neural network with many convolutional layers [20]
3 Problem with Feedforward Neural Network Convolution Neural Network (CNN) mainly contains an input, output layer and many hidden layers. The Hidden layers of a Convolution Neural Network (CNN) is combination of different layers such as pooling, dropout and activation to active neurons. (Fig. 1). Suppose if we are working with a dataset in which size of image is 32 × 32 × 1(black & white image), the total number of neurons in the input layer will be 32 × 32 = 1024, this can be considered for computing. But we know that in current situation, we have very large size of images, suppose we have 200 × 200 × 1 size image then it means we need 40,000 neurons in the input layer. We can clearly say that 40,000 is very large number for computation and ineffective also. So, to overcome this problem we have CNN model in which, feature of the image is get extracted which led to lower the dimension of image which can be computationally feasible and characteristic of image is also not lost.
4 Understanding Convolutional Neural Network In this section, we have discussed about neural network system and important modules of CNN. Such as Stride and Padding (4.1), Convolutional Layer (4.2), pooling layer (4.3), fully connected layer (4.4), optimizers (4.5). 4.1 Stride and Padding As we know in feature learning section of CNN, we extract feature from image with the help of kernel by moving that in whole area of image, how this kernel will move or steps this kernel will take while moving is known as Stride. Sometimes this kernel doesn’t fit the image accurately, in such case we have two options either drop element row/column or add extra row/column with the help of Padding. In the Fig. 2, we can see that dimension of the feature map (3*3) is less than the original image (5*5). And in each step, we get a value (pink colour) corresponding to that filter (green colour).
A Deep Learning Based Approach to Measure Confidence
281
Fig. 2. Stride and padding
Now suppose instead of using the value of stride equal to 1, if we use the value of stride equals 2 and filter size of 2, then we can see that filter will not fit the image and, in this case, we need the concept of padding. either we can add 0 around that image or we can drop that part of the image. In that case, the green box will move with 2 steps each time (up and down) with (2*2) size. As the size of the image is 5*5 last part i.e., index 4 will be covered if padding is not applied, now if we add an extra layer with all zero, we can include that index 4 also. 4.2 Convolutional Layer The convolutional layer can be considered as the main part of CNN, in which neurons finds the features, then they will be activated. There are different types of activation functions that are used for this process. In section small filter is applied in the image and after applying the filter the extraction is passed to next layer. Important point during this process size of input image gets reduced, and now this reduced image will pass to next layer. In Fig. 3, the light blue section resembles our 5 × 5 × 1 input image. And filter size used for this image is 3*3. we can see that in each case we compute with the filter and get its corresponding value for the first step, 7*1 + 4*1 + 3*1 + 2*0 + 5*0 + 3*0 + 3*−1 + 3*−1 + 2*−1 = 6. Similarly, this process will continue for the rest of the steps. Initial weights are assigned randomly and then forward and backward propagation techniques are used to optimize these weights. Each unit on the layer would eventually learn some specific feature of the image. Activation functions are mainly used to activate certain neuron based on that action. This activation function is generally placed just after convolutional layer. Different types of activation functions are present such as Tanh, ReLU, Leaky ReLU, ELU, Maxout, and Sigmoid (Table 1). We will be using ReLU Activation function in our work. (it will be changed if it’s doesn’t work during training). ReLU activation function is called is rectified linear unit, in which negative features are converted to zero and positive value remains same with any change. Suppose if the
282
R. K. Rungta et al.
Fig. 3. Demonstration of convulsion at a particular location
Table 1. Various activation functions Activation function
Expression
Activation function
Sigmoid
1/(1 + e−x )
Leaky reLU
Tanh
ex −e−x ex +e−x
Maxout
ReLU
Max(0, x)
ELU
Expression x, x > 0; αx, x ≤ 0. max(w1T .x + b1 , w2T .x + b2 ) x, x ≥ 0; α(ex − 1), x < 0.
value is −10 then it will be converted to 0 and if value was 10 then it remains same. As we know activation function is applied to neurons so that they can activate and they can pass to the next layer it also helps to add non-linear patten to the network. 4.3 Pooling Layer In pooling layer, the dimension of the images is decreased to reduce computational power which is also known as down sampling. We can reduce the size of image by taking maximum value or by taking average value, when we will take maximum from the neighbor, we called it Max pooling and when take average we called it Average pooling. In Fig. 4, we can see that size of the filter is 2*2, and the value of stride is 2, so after the first step value is 20 (max pooling) and 13 (average pooling). Other slots are filled like that. The ith layer of a CNN consists of the Convolutional Layer, the Activation Layer, and the Pooling Layer. As the complexity of images increase, we can increase the number of these layers to capture the low-level details. But this comes at the cost of more computational power. The above process successfully enables the model to understand and distinguish the various features of an image. After this process, we flatten the final output and ‘feed’ it to a regular Neural Network for classification purpose.
A Deep Learning Based Approach to Measure Confidence
283
Fig. 4. The pooling layer
4.4 Fully Connected Layer After extraction of features from images, next step is to flatten the image, flattening is a process in which images are converted to 1D array, after converting it to 1D array it is connected to Fully connected Layer (Fig. 5), in this layer all the neuros are connect to each other. And generally, the activation function used here is either softmax or sigmoid. If we need to predict binary output/classes we use sigmoid else we go for softmax. Sigmoid activation function is used when we have binary class, and sigmoid is used when we have multiple class. sigmoid works as if value is greater than 0.5 then it will predict it to class 1 else it will predict it to class 0. Whereas softmax return in index of maximum value in output vector {y1, y2, y3} (Fig. 5).
Fig. 5. Neural network after feature engineering
284
R. K. Rungta et al.
The SoftMax or Logistic layer is the last layer of CNN. It resides at the end of the FC layer. Logistic is used for binary classification and SoftMax is for multi-classification. Table 2. One hot encoding Yes
No
Output
0
1
No
1
0
Yes
We generally use SoftMax or sigmoid function before the output layer or final layer to get in the form of one-hot encoding (Table 2). Above we can see one hot representation of yes/no. 4.5 Optimizers Optimizers helps to reduce the losses, i.e., the difference between predicted values and the expected output, by changing the attributes of the neural networks such as weights and learning rates. This helps to increase the accuracy of the Network. Two types of optimizers will be used in this work and comparisons will be done between these two with respect to their computational time and accuracy. To be precise, the optimizers SGD and ADAM are used.
5 Experimental Setup We have used two separate systems for carrying out the experiment. In system 1, we use the Google Colab (Free Tier) and in system 2, we use Intel Core i5-10400F with 16GB Memory and Python 3.9.2 with Jupyter Notebook. 5.1 Dataset Description The dataset has more than 7000 grayscale images classified into confident and unconfident expressions [21]. 5.2 Workflow Diagram We have divided workflow diagram into 3 parts model training (Fig. 6-a), Prediction (Fig. 6-b), face detection using OpenCV (Fig. 6-c). 5.3 Component Modules The following are the component modules used.
A Deep Learning Based Approach to Measure Confidence
285
Fig. 6. (a): Workflow of model training (b) Prediction (c) Workflow of OpenCV face detection and confidence calculation
5.3.1 Image Data Generation Module: The module initializes the image generator module, used to generate image variations in the Data pre-processing module. 5.3.2 Data Cleaning Module The module is used to remove irrelevant data from our datasets and splitting them into training and testing data sets with a 75%: 25%. 5.3.3 Data Pre-processing Module The module is used to process the input image and maintain consistency in the input and test data. The images are resized to required dimensions and an image data generator is used to generate images with various variations. 5.3.4 Training Module In the training module, we train the model with the help of the CNN model. 5.3.5 Validation and Evaluation Module The module performs the evaluation of trained models against test data and produces the required evaluation metrics for the model, which is useful in validating the model. And save the final model for future work. We are going to use Accuracy, precision,
286
R. K. Rungta et al.
Recall, AUC, and F1 score as our evaluation metrics and select the model for the App and classify the results. Final Result Generator Module. In this module, we will generate the final score of the user i.e. Average score of their confidence during the entire interview session.
5.4 Algorithmic Steps • During the interview, the video is split and images are extracted every 5 s (Fig. 6c). These images are run through the deep learning model which classifies them as confident or unconfident (Fig. 6b). We have trained a convolutional neural network on an image dataset we found on Kaggle. We have made a CNN model and we have added many convolution layers and activation functions. We have taken an average of the confidence score of all the images as our final confidence score. • Lastly, we will show the user their confidence score during the entire interview process which may help them in the future. 5.4.1 Understanding Datasets Reading images from users and resize them (Fig. 7).
Fig. 7. Screenshot of the python code to read images from the user and resize them
5.4.2 Training Results In this module we have demonstrated python code for showing the results for training model.
A Deep Learning Based Approach to Measure Confidence
287
import seaborn pred_value_train_ing = model.predict(x_train) pred_value_train_ing = np.argmax(pred_value_train_ing,axis=1 ) y = np.argmax(y_train,axis=1 ) clf_report = classification_report(y, pred_value_train_ing,output_dict=True) seaborn.heatmap(pd.DataFrame(clf_report) plt.show()
Fig. 8. Heatmap of classification report
5.4.3 Testing Result In this module we have demonstrated python code for showing the results for testing model.
y = np.argmax(y_test,axis=1 ) y_index_0 = np.where(y == 0) y_index_1 = np.where(y == 1) len(y_index_0[0]),len(y_index_1[0]) (2436, 3306) Here we can see that we have imbalanced data in the testing phase due to which accuracy of the model is low i.e., 78–79% (Fig. 9) as compared to training (Fig. 8). This can be increased using hyperparameters, or distributing them balanced or build ensemble models by combining two three CNN models together.
288
R. K. Rungta et al.
5.5 GUI with OpenCV Upon running the GUI, we get the following screen (Fig. 10) which takes input from camera and gives the confidence measure output:
Fig. 9. Heatmap of classification report
Output for a smiling (confident facial expres- Output for a worried (unconfident facial exsion) pression)
Fig. 10. Confidence measurement output in the GUI
6 Results and Analysis Following is the tabulation and comparison of the metrics obtained from both of the models: Graphs for Cross Entropy Loss and Classification Accuracy are shown in Fig. 11(a and b). Legend: Orange line represent = testing datasets; Blue line represent = training data. Finally, after the confidence measurement with the help of OpenCV module, we get the plot of confidence and the average confidence during the interview.
A Deep Learning Based Approach to Measure Confidence
289
Table 3. Tabulation and comparison of the metrics obtained from the models Precision Adam
Accuracy SGD
Adam
Recall SGD
Adam
SGD
Confident
1
0.68
1
0.68
0.99
0.28
Non-Confident
1
0.63
1
0.63
1
0.91
(a) Adam
(b) SGD Fig. 11. (a and b): Graphs for cross entropy loss and classification accuracy Table 4. Computational statistics SGD
Adam
Training time in CPU (Desktop)
5560 s
2751 s
Training time in GPU (Google colab)
242 s
231 s
Accuracy
0.63
0.98
Table 5. Advantages and disadvantages of both optimisers SGD
ADAM
Advantages
Disadvantages
Advantages
Disadvantages
Easy computation
Memory requirement is more
Rectifies vanishing learning rate, high variance
Computationally costly high
Easy to implement
Local minima problem
Less training times
290
R. K. Rungta et al.
Average Confidence during the interview was 57.47910448876916 (Fig. 12).
Fig. 12. Plot of confidence vs time
7 Conclusion Here we have built two Neural networks with different optimizers, first is With Adam, second is SGD (Table 5). And after that we have made detailed comparisons between this model based on different parameters metrices such as accuracy, recall, precision and F1-score. In which we have found (Table 3) that Neural network that use Adam as Optimizers have performed better than SGD in all the parameters, testing accuracy for Adam is 0.723 whereas testing accuracy for SGD is 0.68 (Fig. 11) We have trained our model in a cloud shared GPU platform and calculated Training time for each model also. And among all, again models that used Adam as optimizers took less time for training (Table 4). We have trained model in hardware also and compared training time also. In which we found that training time in GPU is much less then training it on hardware. After building a model we have implemented that for making prediction of Confidence with the help of an open cv. In which we have calculate user’s mean confidence during entire review session (Fig. 12).
References 1. Adate, A., Tripathy, B.K., Arya, D., Shaha, A.: Impact of deep neural learning on artificial intelligence research In: Bhattacharyya, S., Hassanian, A.E., Saha, S., Tripathy, B.K. (eds.) Deep Learning Research and Applications, pp. 69–84. De Gruyter Publications (2020) 2. Adate, A., Tripathy, B.K.: Deep learning techniques for image processing. In: Bhattacharyya, S., Bhaumik, H., Mukherjee, A., De, S. (eds.) Machine Learning for Big Data Analysis, pp. 69–90. De Gruyter, Berlin, Boston (2018)
A Deep Learning Based Approach to Measure Confidence
291
3. Adate, A., Tripathy, B.K.: A survey on deep learning methodologies of recent applications. In: Acharjya D.P., Mitra A., Zaman N. (eds.) Deep Learning in Data Analytics- Recent Techniques, Practices and Applications, pp. 145–170. Springer Publications (2021) 4. Bose, A., Tripathy, B.K.: Deep learning for audio signal classification. In: Bhattacharyya, S., Hassanian, A.E. Saha, S., Tripathy, B.K. (eds.) Deep Learning Research and Applications, pp. 105–136. De Gruyter Publications (2020) 5. Alizadeh, S., Fazel, A. : Convolutional neural networks for facial expression recognition. arXiv 2017. arXiv preprint arXiv:1704.06756 (2017) 6. Bhattacharyya, S., Snasel, V., Hassanian, A.E., Saha, S., Tripathy, B.K.: Deep Learning Research with Engineering Applications, pp. 79–96. De Gruyter Publications (2020) 7. Chatfield, K., et al.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014) 8. Kaul, D., Raju, H., Tripathy, B.K.: Deep learning in healthcare. In: Acharjya D.P., Mitra A., Zaman N. (eds.) Deep Learning in Data Analytics- Recent Techniques, Practices and Applications, pp. 97–115. Springer Publications (2021) 9. Long, Y., Kong, W., Ling, W., Yang, C., Zhu, J.: Comparison of facial emotion recognition based on image visual features and EEG features. Cogn. Syst. Signal Process. 162–172 (2019). https://doi.org/10.1007/978-981-13-7986-4_15 10. Liu, K., Zhang, M., Pan, Z.: Facial expression recognition with CNN ensemble. In: 2016 International Conference on Cyberworlds (CW) (2016). https://doi.org/10.1109/cw.2016.34 11. Maheswari, K., Shaha, A., Arya, D., Tripathy, B.K., Rajkumar, R.: Convolutional neural networks: a bottom-up approach. In: Bhattacharyya, S., Hassanian, A.E., Saha, S., Tripathy, B.K.: Deep Learning Research with Engineering Applications, pp. 21–50. De Gruyter Publications (2020) 12. Raja Sekaran, S.A.-P., Poo Lee, C., Lim, K.M.: Facial emotion recognition using transfer learning of AlexNet. In: 2021 9th International Conference on Information and Communication Technology (ICoICT), pp. 170–174. (2021). https://doi.org/10.1109/ICoICT52021.2021. 9527512 13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 14. Sarraf, S., Tofighi, G.: Classification of Alzheimer’s disease structural MRI data by deep learning convolutional neural networks. arXiv preprint arXiv:1607.06583 (2016) 15. Alizadeh, S., Fazel, A.: Convolutional neural networks for facial expression recognition. https://arxiv.org/abs/1704.06756 (2017) 16. Dhawle, T., Ukey, U., Choudante, R.: Face detection and recognition using OpenCV and python. Int. Res. J. Eng. Technol. (IRJET) 07(10–57) (2020) e-ISSN: 2395-0056 17. Tripathy, B.K., Anuradha, J.: Soft Computing- Advances and Applications, pp. 110–123. Cengage Learning Publishers, New Delhi (2015) 18. Freund, Y., Haussler, D.: Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks, Technical Report UCSC-CRL-94-25. University of California, Santa Cruz (1994) 19. Hinton, G.E.: To recognize Shapes, First Learn to Generate Images, Technical Report UTML, TR 2006-003. University of Toronto (2006) 20. https://miro.medium.com/max/2000/1*vkQ0hXDaQv57sALXAJquxA.jpeg. Last Accessed 13 Feb 2022 21. https://www.kaggle.com/sougatganguly/confident-unconfident. Last Accessed 13 Feb 2022
A Commercial Banking Industry Resilience in the Case of Pandemic: An Impact Analysis Through ANOVA Sweta Mishra1 , Shikta Singh1 , and Debabrata Singh2(B) 1 KIIT School of Management, KIIT (Deemed to be University), Bhubaneswar24, Odisha, India
[email protected] 2 Department of CA, ITER, Siksha ‘O’ Anusandhan (Deemed to be University),
Bhubaneswar30, Odisha, India [email protected]
Abstract. The Covid-19 pandemic has shaken the nation and the concerned environment in all types of business and all sectors as well. However, the growth of the economy and the development of a country depends on the banking industry. The present study has been conducted to find out the impact of Covid-19 on the public sector and private sector banks in Bhubaneswar region. The data was collected from 289 bank employees from both the sectors through survey and analyzed to find the impact of the pandemic. It was seen that banks in both the sectors maintained social distancing, used appropriate sanitization facilities and also made the suitable use of technology as this is the need of the hour. The study also throws light on factors like attention of managers to the work, following government rules, communication of information, taking suggestions from employees as well as flexibility and extra benefits at workplace. This study contributes to the development of the human resources as they play a pivotal role in the working of the banking sector. Some suggestions have been made for implementing such practices in other industries for progress and prosperity of the economy as a whole. Keywords: Covid-19 · Banking industry · Public sector · Private sector · ANOVA
1 Introduction The outbreak of corona virus or Covid-19 pandemic has affected the economy across the world, which includes financial markets in all magnitudes [1]. The banking sector in particular, is the most crucial part of the economy as it is the financial hub and the supporting hand for all other industries. The pandemic has impacted the performance of the Indian banking industry in terms of profitability, advances, non-performing assets etc. The pandemic has changed the profitability of the banking sector in India [2]. The banking sector has encountered various crises, and the adjustments made have had a significant impact on the financial structure. The situation has also hampered the employees working in this sector [3, 4]. The human resources engaged in banks have faced problems in terms © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 292–303, 2022. https://doi.org/10.1007/978-981-19-3089-8_28
A Commercial Banking Industry Resilience in the Case of Pandemic
293
of maintaining work-life balance. To get a better profitability, it is crucial to keep the employees satisfied and working. The major focus of mangers in banks should be to maintain the human resources in a manner to achieve better productivity and steadily overcome this pandemic situation with consistent efforts [5]. Despite various pessimistic estimates, the Indian financial sector has proven to be unpredictably tough in such a pandemic phase. This has happened due to the reason that the organizations or corporations, which cater to almost more than half of all loans, are much improved and the banks are capitalized well [18]. This promises positively for the growth in loans as well as the performance of the bank and bank performance as a repercussion of the Covid-19 pandemic. The Indian economy has been devastated by two waves of the COVID-19 pandemic. In terms of gross domestic product (GDP), the economy declined by 7.3% in 2020–21. Growth in 2021–22 is estimated to be 9–10%, less than the 12% that was commonly predicted much earlier than the outbreak of the second wave. However, the Financial Stability Report (FSR) as demonstrated by the Reserve Bank of India (RBI) for the month of July, 2021, the effect of the two waves on banking sector was lesser than most of the people had expected it to be. Given that banking system has entered the pandemic in a condition of stress, this is a notable outcome. For understanding the situation and the current scenario related to the banking sector, a gist of the work done by other authors has been highlighted in the literature review. This will make the study clearer and help us understand the pandemic situation in a better way. 1.1 Research Objectives The purpose of this article is to investigate the influence of Covid-19 on public and private sector banks in the Bhubaneswar region, as well as to examine the precautionary measures made by management to function in such pandemic periods. 1.2 Conceptual Framework of the Study The conceptual framework gives the important factors that the managers in the banking sector should consider to keep the employees engaged and motivated during this Covid19. Figure 1 gives the details of the factors identified from literature as well as different websites which give information about the current Covid-19 scenario. After finding the important factors, we note down the methods that we will follow to find the impact of Covid-19 in the banking industry.
294
S. Mishra et al.
Attentiveness of Mangers Flexibility and extra benefits
Social distancing
Utilizing technology
COVID-19
Sanitation facilities
Communication and information sharing
Government rules Soliciting suggestions
Fig. 1. Important factors affecting Covid-19 situation in banking industry
2 Literature Review Taking into account the current crisis, the review has been done only on recent research articles to analyze the situation from different aspects. The details of the literature review have been mentioned in the paragraphs below. Prajapati, G. and Pandey, S [6], studied about the employee experience and how employees have managed to maintain their lifecycle during the pandemic in investment banks. Semi-structured interviews with the HR managers of the banks brought five themes like virtual recruitment, communication, employee wellbeing, work from home and diversity and inclusion to the forefront for enhancing employee experience. Keeping and getting the human resources at the perfect place is also vital. Organizations should leverage new technologies and also take steps to implement it at every stage to sail through such times of crisis. Jyothi, M. and Reddy, B. [7], have studied the impact of the pandemic on banks and non-banking finance companies. They have collected views from economists, financial institutions, World Bank, consulting firms and information from secondary sources. It was seen that this pandemic has created a struggle for the banks to maintain their deposits. The government has taken several steps to keep the functioning of the banks at a normal pace since the lockdown. The authors suggest that RBI must take steps to maintain the liquidity in financial system as a whole. The organizations should plan for the recoveries due to this pandemic so that there is no distraction for serving customers. The government should also open the economy to cope up with this situation.
A Commercial Banking Industry Resilience in the Case of Pandemic
295
Thakor, C.P. [8], has studied about the impact of covid-19 situation from an Indian perspective. The author found out that banks should focus and renew four key areas like adopting new technologies, digitalization, customer trust and privacy and compliance with the policies. Banks will operate in a condition where interest rates are low and the government will play a very crucial role in the operations of the banking system by making better fiscal policies to face this situation and come out of it gradually. Chaudhari, C et al. [9], have done an analysis on the secondary data collected from various research articles on Covid-19. The pandemic has adversely affected the functioning of the banks and the pandemic gives a vague picture of the future of normal working of the banking sector. The RBI and the government have taken steps to overcome this situation but still a lot of changes is needed to function properly. They suggest that a strong leadership can help in bringing the economy on track. Bui, T. T. H. et al. [10], have studied the impact of green banking practices during the covid-19 time. By collecting data from secondary sources like other journals, websites, RBI bulletin and magazines, they have found that green banking played a pivotal role in enhancing the banking services and the customers were happy using new technologies by staying at home. The authors have suggested to use new technologies like computers, internet banking and the other banking services by using less paper and saving nature. The literature review gives a clear picture and builds a base for taking the study further. The next step is framing the objectives of the study and analyzing it to find concrete solution for the study.
Fig. 2. Capital ratio comparisons between 2006 to 2019 [16]
296
S. Mishra et al.
Prior to the global financial crisis, common equity tier-1 (CET-1) ratios in Europe, United Kingdom (UK) and the United States (USA) were 6% to 8%. In this context, the expected landing points under the situations A1 and A3 of 8.5%–10.0% in European Union, 11%–13% in United Kingdom, and 8.0%–10.5% in the United States, as represented below, reveals the flexibility which the global banking system has built from 2006 to 2019 which is depicted in Fig. 2. • All the banks affected roughly US$5 trillions of capital due to Covid-19 crisis above their regulatory requirements [20]. • Depending on the crisis, banks capital is hitting and is affected by their willingness to use the buffers and other policy. • Due to the savings and loan crisis (adheres situation), banks’ overall buffers are expected to fall by US$ 800 billion, which might enable an additional US$ 5 trillion in loans (6% of total outstanding loans). However, in a worst-case situation akin to the Great Financial Crisis, the equivalent values would be just US$ 270 billion and US$ 1 trillion, respectively (1.3% of total loans) [21]. • As per the COVID-19 crisis which has confirmed, the markets are highly aware of the necessity of a capital cushion in justifying the external shocks. With diminishing revenues and earnings, capital formation will be difficult. According to our analysis, the creation of the capital from that of the retained earnings will come in the range from 0.5%–1% point of common equity tier-1 (CET-1) yearly to 0.2%–0.5% point, making biological recapitalization considerably sluggish. It will also be very tough for private funding raising as well [11, 22]. Consequently, the banking industry should consider different types of actions, some of them as tactical and some others as structural. As the Covid-19 pandemic advances, the whole banking industry might enter, what can be called as a caution zone, with common equity tier-1 (CET-1) ratio of around 8%–10%, in which the banks should start to restructure their cushions and should take other safety measures, as shown in Fig. 3. When, on one hand we see that the complete banking system seems to be resilient, the individual and specific banks as well as the whole provincial systems may enter a threat zone with a CET-1 level of roughly 5.5% [12, 19]. We’ll look at three situations that executives have stated. First, Scenario A1, the most probable, foresees a slow global recovery by 2023. Whereas, Scenario A3 is more hopeful about the spread of the corona virus as well as the public-health response, anticipating a recovery by 2021. (Such a scenario may still likely be for some parts of Europe, but such seems very unlikely for the United States). Scenario B2 demonstrates greater uncertainty about the usefulness of the public-health approach. Initially, considering the two milder cases, A1 and A3. (It is not necessary that the same situation will be seen in every region). We estimate CET1 ratios in developed economies to descent by one to five percentage points, depending on the situation and area.
A Commercial Banking Industry Resilience in the Case of Pandemic
297
Fig. 3. Global economy scenario of economic policy response and public health response (https:// www.mckinsey.com/industries/financial-services/our-insights/banking-system-resilience-in-thetime-of-covid-19)
2.1 COVID-19 and Fiscal–Monetary Policy Coordination In the context of the COVID-19 pandemic, the national government’s economic stimulus packages are examined, and an attempt is made to discover realistic fiscal and monetary policy coordination. When credit-linked economic stimulus packages have only a limited influence on economic recovery, an accommodating fiscal policy position in the upcoming Union Budget 2022–23 is critical for the economy [14, 22, 23]. Globally, there is increasing disapproval about the practice of separating monetary and fiscal policy when evaluating the macroeconomic effect of deficits on economic growth outcomes [15]. In the center of the COVID-19 pandemic, if the way to fiscal merging is through public expenditure cuts rather than tax resilience, it may have an adverse impact on the recovery of the whole economy [16, 17]. In the article, we argue that when liquidity infusions are limited in stimulating economic recovery, high levels of deficit can be justified by increasing public investment, particularly in health as well as the capital infrastructure, because this is a dual crisis, such as a public health crisis and a macroeconomic crisis. In such crisis hours, it is not only the amounts of scarcity that matter, but also the funding of debits. Any normalization procedure declaration in the upcoming Union Budget 2022–23 to scale back economic stimulus initiatives may have a negative impact on the recovery of economic growth [13].
298
S. Mishra et al.
3 Research Methodology The current study was carried out among workers of State Bank of India (SBI) and Axis Bank in the Bhubaneswar region of Odisha. The study is exploratory in character, and the sample was collected through purposive sampling. The study is primary because the data was gathered through a survey approach. For the survey, a structured questionnaire was created utilizing a 5-point Likert scale and delivered to workers working at various branches of the individual banks. A total of 315 questionnaires were distributed and 289 were used for the analysis. The reason being that out of 315, 26 questionnaires were not filled and incomplete, which caters to 8.50% of rejection rate. The rest 289 were used and analysis was done to find out the effect of Covid-19 in banks. The reliability as well as validity tests were conducted and the data was collected through online mode from the employees. The details of the analysis done have been mentioned in the paragraphs below. 3.1 Analysis of Data For analyzing the data, initially we find the reliability of the questionnaire and the details in mentioned in Table 1. Table 1. Reliability statistics Cronbach’s Alpha
Number of items
0.836
8
Table 1 gives the results of reliability test for factor affecting covid-19 in banks. It shows that the Cronbach’s Alpha value is 0.836 which is greater than 0.7 for 8 factors. This means that the questionnaire is suitable for circulating and conducting the survey. Further, to check the adequacy of the sample and the strength of relationship among the variables, KMO and Bartlett’s Test of Sphericity was conducted, which is given in Table 2. Table 2. KMO and Bartlett’s test Kaiser-Meyer-Olkin measure of sampling adequacy
0.862
Bartlett’s test of sphericity
Approx. Chi-square
1160.075
df
28
Sig.
0.000
From the table it can be seen that the KMO value is 0.862(>0.6) and Bartlett’s Test of Sphericity is 0.000 (p 0.05 and statistically there is no significant difference between the sectors. This again means that sanitization facilities are of priority for both sectors and employees working in SBI and Axis Bank follow it very meticulously. They use sanitizers frequently and keep themselves as well as other employees safe while working in the organization. For the fourth factor, the significant value was found to be 0.004 and it is 0 is a constant to achieve generalization. For any value of c, Iis (·) remains bounded between 0 and 1. Given P (A) = 0 implies Iis (A) = 1 and P (A) = 0 implies Iis (A) = 1. To be validated as an information function, the following lemma must be proved. Lemma 1. The function f (x) = decreasing on x ∈ [0, 1].
2 −1 (1 pi sin
− xc ), c > 0, is monotonically
Proof. For the given function, 2 cxc−1 , f (x) = − pi (1 − (1 − xc )2 )
⇒ f (x) < 0 given x ∈ [0, 1] and c > 0. Equation (4) proves the lemma.
(4)
is-Entropy: A Novel Uncertainty Measure for Image Segmentation
451
Definition 2 (is-Entropy). Let X be a discrete random variable with probability mass function P = {p1 , p2 , . . . , pn }, then the is-Entropy of X is defined as 2 His (X) = pi sin−1 (1 − pci ). (5) π i To be a valid entropy function, the following lemma must be proved. Lemma 2 (Concavity). The function f (x) = function for any value of c > 0 over x ∈ [0, 1].
−1 2 (1 π x sin
− xc ) is a concave
Proof. f (x) = −
2 cx2c−1 (x2 − c − 2) , π (1 − (1 − xc )2 )3/2
⇒ f (x) < 0, given x ∈ [0, 1] and c > 0.
(6)
Equation (6) proves the lemma. As mentioned before, log function in Shannon Entropy is unbounded and unstable near zero but usual practice is to consider log(0) be defined and H(0) = 0. The following example shows the comparison between Shannon entropy and the newly defined is-Entropy. Let there be a Bernoulli distributed random variable with probability of occurrence as p. Then, the plot for Shannon Entropy and is-Entropy against varying p is shown in Fig. 1 for various values of c. Entropy of a Bernaulli Random Variable Shannon Entrp[y is-Emtropy with c=0.1 is-Emtropy with c=1 is-Emtropy with c=10 is-Emtropy with c=50
0.9
0.8
0.7
H(P)
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
P
Fig. 1. Comparison of Shannon entropy and is-Entropy, with varying value of free parameter c, for a Bernoulli Random Variable with probability of occurrence as p.
452
3.1
B. C. Dharmani
Properties of the is-Entropy
The section derives some of the desired and other special properties of is-Entropy. Property 1 (Nonnegativity). Let X be a random variable with probability mass function P = {p1 , p2 , . . . , pn }, then the is-Entropy of X is nonnegative, i.e., His (X) ≥ 0. Proof. As the function sin−1 (x) ≥ 0, for x ∈ [0, 1], His (X) ≥ 0 (∵ Eq. (5)). Property 2 (Continuity). Let X be a random variable with probability mass function P = {p1 , p2 , . . . , pn }, then the is-Entropy of X, His (X), is a continuous function of P . Proof. The function sin−1 (x) is continuous over x ∈ [0, 1], and multiplication of two continuous functions is continuous. Property 3 (Permutational Symmetry). Let X be a random variable with probability mass function P = {p1 , p2 , . . . , pn }, then the is-Entropy of X, His (X), remains same over any permutation of P . Proof. The commutation property of summation proves this property. Property 4 (Boundedness). Given X be any random variable with probability mass function P = {p1 , p2 , . . . , pn }, then the is-Entropy of X, His (X), is bounded at both the ends and 0 ≤ His (X) ≤ 1. Proof. Given c > 0, c ∈ R, 0 ≤ sin−1 (1 − pci ) ≤ π/2 and His (X) =
2 2 π pi sin−1 (1 − pci ) ≤ pi ≤ 1(∵ pi = 1). π i π i 2 i
(7)
Property 1 proves the lower bound and Eq. (7) proves the upper bound. Property 5 (Concavity). Given X be any random variable with probability mass function P = {p1 , p2 , . . . , pn }, then the is-Entropy of X, His (X), is concave. Proof. ∂His (X) = pi ∂pi
−cpc−1 i 1 − (1 − pci )2 ⇒
+ sin−1 (1 − pci ), (∵ Eq. (5)),
∂ 2 His (X) = 0, for i = j, ∂pi ∂pj
(8)
≤ 0, for i = j. Equation (8) proves that His (X) is a strictly concave function of pi . Property 6 (Maximum Entropy). The is-Entropy, His (X), is maximized for the uniform distribution or the uniform distribution achieves the maximum isentropy.
is-Entropy: A Novel Uncertainty Measure for Image Segmentation
453
Proof. Let His (X) be optimized to achieve the maximum entropy under the constraint that i pi = 1. The Lagrangian for the same is:
−1 c pi sin (1 − pi ) − λ pi − 1 . (9) L= i
i
Taking gradient of Eq. (9) and equating it to zero for optimization: ∂L −cpc−1 i = sin−1 (1 − pci ) + pi − λ = 0, ∂pi 2 1 − (1 − pci ) ⇒ λ = sin−1 (1 − pci ) + pi
−cpc−1 i 1 − (1 − pci )
2
, ∀i = 1 : n,
⇒ h(pi ) = h(pj ), ∀i = j,
(10) c−1
where the function h(x) = sin−1 (1 − xc ) + x √ −cx c 2 is one-one over [0, 1] 1−(1−x ) and so from Eq. (10) pi = pj , ∀i = j. As i pi = 1, pi = pj = n1 , it is a uniform distribution which achieves the maximum entropy.
4
Simulations
The proposed uncertainty measure has been verified for its application on image segmentation task. Image segmentation targets assigning a label to each pixel such that the set of pixels with a same label represent some meaningful region or object or boundary in the image. Kapur et al. [4] introduced a method to use Shannon Entropy for multi-level thresholding for gray level image segmentation. To segment the image into k + 1 parts, mathematically the problem is to find optimal k thresholds (t1 , t2 , . . . , tk ) ∈ [0, 255], t1 ≤ t2 ≤ . . . ≤ tk , in a gray level images giving maximization of the sum of entropies considering distributions between consecutive ti and ti+1 gray levels as in the following Eq. (11): Φ= where Hi =
max Ψ (t1 , t2 , . . . , tk ) =
t1 ,t2 ,...,tk ti j=ti−1
max
t1 ,t2 ,...,tk
k
Hi ,
i=1
ti pj , Pi = pj and t0 = −1. P +1 i j=t +1
(11)
i−1
The method, now identified as the Kapur’s method for multi-level image thresholding based segmentation, is quite popular and has been used with varying entropy definitions and varying optimization algorithm. The article [15] used Tsallis entropy, [11] used generalized α entropy and [1] used t-Entropy definitions with the Kapur’s algorithm for image segmentation. Similarly, [7,13] used Differential Evolution, [15] used artificial bee colony approach, [5] used Hybrid
454
B. C. Dharmani
(a) Shannon Entropy
(b) Tsallis Entropy (q = 2)
(c) is-Entropy (c = 0.1)
(d) Shannon Entropy
(e) Tsallis Entropy (q = 2)
(f) is-Entropy (c = 0.1)
(g) Shannon Entropy
(h) Tsallis Entropy (q = 2) (i) Proposed (c = 0.1)
(j) Shannon Entropy
(k) Tsallis Entropy (q = 2)
is-Entropy
(l) is-Entropy (c = 0.1)
Fig. 2. Comparison of image segmentation using different entropy definitions: for number of segments = 2: (a)-(b)-(c) segmented images and (d)-(e)-(f) histogram superimposed with the estimated thresholds; for number of segments = 3: (g)-(h)-(i) segmented images and (j)-(k)-(l) histogram superimposed with the estimated thresholds.
is-Entropy: A Novel Uncertainty Measure for Image Segmentation
(a) Shannon Entropy
(b) Tsallis Entropy (q = 2)
(c) is-Entropy (c = 0.1)
(d) Shannon Entropy
(e) Tsallis Entropy (q = 2)
(f) is-Entropy (c = 0.1)
(g) Shannon Entropy
(h) Tsallis Entropy (q = 2) (i) Proposed (c = 0.1)
(j) Shannon Entropy
(k) Tsallis Entropy (q = 2)
455
is-Entropy
(l) is-Entropy (c = 0.1)
Fig. 3. Comparison of image segmentation using different entropy definitions: for number of segments = 4: (a)-(b)-(c) segmented images and (d)-(e)-(f) histogram superimposed with the estimated thresholds; for number of segments = 5: (g)-(h)-(i) segmented images and (j)-(k)-(l) histogram superimposed with the estimated thresholds.
456
B. C. Dharmani
Whale Optimization algorithm and [6] used Harmony Search Optimization for the same. Rebika Rai et al. [9] provides a latest survey on nature inspired optimization algorithms used for multilevel thresholding image segmentation. The current article uses is-Entropy with Differential Entropy (DE) as a heuristic search based optimization algorithm. All the simulations for this article used fixed parameters. The population size used was five times the number of thresholds, the crossover probability was kept at 0.2, the scale factor for mutation was randomly selected to be between 0.2 and 0.8, number of iterations were kept to be 20 and was the stopping criteria. The experiment used popular ‘Cameraman’ image in grayscale for segmentation and the number of segments were varied from 2 to 5. The following Fig. 2 shows the mutual comparison of the multilevel thresholding based segmentation outcomes for Shannon entropy, TSallies Entropy with free parameter q = 2 and the proposed is-Entropy with c = 0.1 of the segmented images for the number of segments two and three, i.e., the number of thresholds one and two. The first row shows the segmented images and the second row shows the image histogram superimposed with the selected thresholds for two segments. The third and fourth rows show similar results for three segments. The Fig. 3 presents similar outcome comparisons for the number of segments four and five. The visual inspections show that the outcomes of the segmented images and the estimated thresholds using is-Entropy are comparable with those using Shannon entropy, and are quite better than those using TSallis entropy. This observations empirically validate the proposed entropy definition and motivates to do further exploration of it on various other applications.
5
Conclusion and Discussion
The article defines a new information measure based on inverse sine function in Eq. (3), identified as is-Information. It has been proved to be monotonically increasing with increasing probability in Lemma 1. The definition induces a novel entropy measure, identified as is-Entropy, in Eq. (5). The is-Entropy definition is proved to be satisfying the properties of nonnegativity, continuity, permutational symmetry, boundedness and concavity in Sect. 3.1. It is also proved to provide maximum entropy for uniform distributions. The empirical analysis using is-Entropy for multi-level thresholding based image segmentation validates the new entropy definition. In future, the work will be extended in the following directions: (1) The is-Entropy definition will be explored for many other properties of interest such as the expressions and correlations of joint, marginal and conditional entropy. (2) An entropy definition induces a unique distance measure between two probability distributions. The new divergence will be verified mainly for the robustness and other properties and will be applied for various ML applications. (3) The is-Entropy will be applied to other ML tasks and will be compared with other existing entropy definitions. It will be analyzed with respect to the free parameter c and related variations in the significant entropy characteristics will be explored.
is-Entropy: A Novel Uncertainty Measure for Image Segmentation
457
References 1. Chakraborty, S., Paul, D., Das, S.: t-entropy: a new measure of uncertainty with some applications. In: 2021 IEEE International Symposium on Information Theory (ISIT), pp. 1475–1480. IEEE (2021) 2. Csisz´ ar, I.: Axiomatic characterizations of information measures. Entropy 10(3), 261–273 (2008) 3. Ghosh, A., Basu, A.: A generalized relative (α, β)-entropy: geometric properties and applications to robust statistical inference. Entropy 20, 347 (2018) 4. Kapur, J.N., Sahoo, P.K., Wong, A.K.: A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29(3), 273–285 (1985) 5. Lang, C., Jia, H.: Kapur’s entropy for color image segmentation based on a hybrid whale optimization algorithm. Entropy 21(3), 318 (2019) 6. Oliva, D., Cuevas, E., Pajares, G., Zaldivar, D., Perez-Cisneros, M.: Multilevel thresholding segmentation based on harmony search optimization. J. Appl. Math. 2013 (2013). https://doi.org/10.1155/2013/575414 7. Pei, Z., Zhao, Y., Liu, Z.: Image segmentation based on differential evolution algorithm. In: 2009 International Conference on Image Analysis and Signal Processing, pp. 48–51. IEEE (2009) 8. Principe, J.C.: Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives, 1st edn. Springer, Heidelberg (2010) 9. Rai, R., Das, A., Dhal, K.G.: Nature-inspired optimization algorithms and their significance in multi-thresholding image segmentation: an inclusive review. Evol. Syst. 1–57 (2022). https://doi.org/10.1007/s12530-022-09425-5 10. R´enyi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, pp. 547–561. University of California Press (1961) 11. Sadek, S., Abdel-Khalek, S.: Generalized α-entropy based medical image segmentation. J. Softw. Eng. Appl. 2014 (2013). https://doi.org/10.4236/jsea.2014.71007 12. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948) 13. Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997) 14. Tsallis, C.: Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 52(1), 479–487 (1988) 15. Zhang, Y., Wu, L.: Optimal multi-level thresholding based on maximum Tsallis entropy via an artificial bee colony approach. Entropy 13(4), 841–859 (2011)
AGC Based Market Modeling of Deregulated Power System Employing Electric Vehicles and Battery Energy Storage System Debdeep Saha1(B) , Rajesh Panda1 , and Bipul Kumar Talukdar2 1 Department of Electrical Engineering, Indian Institute of Engineering Science and Technology
Shibpur, Shibpur 711103, India [email protected] 2 Department of Electrical Engineering, Assam Engineering College, Guwahati 781017, India
Abstract. This paper presents market-based approach towards automatic generation control for an open market environment. Two control areas consisting integrated governing strategy for electric vehicles (EV) aggregators, battery energy storage system (BESSs) and traditional conventional resources are involved in automatic generation control framework. Each unit is employed with Generation Rate Constraint and Governor Dead Band to provide an insight to the realistic power system. Also, the test system is simulated for open market scenario to check the robustness of the system under bilateral transaction with coordinated control of EV and BESS. The large-scale EV, BESS and conventional sources from each control areas participate in the pool-co market bidding process and maximize their profit by placing revised bids to the Independent System Operator (ISO). The investigation results infer that the coordinated system with dispatchable and non dispatchable units can fully integrate the advantages of EV/BESS, but also attain the proportionate between different conventional sources, which in turn increase the frequency stability and ensure effective market clearing mechanism and economic dispatch. Keywords: Automatic generation control · Battery energy storage · Controller · Deregulation · Electric vehicle · Proportional integral derivative · Independent system operator
1 Introduction Recent studies have been focusing on fuse of renewable energy sources and dispatchable units in the mix of generation which has resulted into less optimality of economic dispatch and is compromised. Much more energy reserve will be required while investigating the automatic generation control (AGC) of an interconnected system [1]. The above stated factors directed researchers to work on integrating and utilizing dispatchable resources under AGC framework; and to check for optimization of the AGC system to achieve optimal economic dispatch for an integrated system. Conventional AGC system focuses on utilizing controlled environment by assigning the proportionate power to all © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 458–468, 2022. https://doi.org/10.1007/978-981-19-3089-8_44
AGC Based Market Modeling of Deregulated Power System
459
conventional sources. AGC fixes participation factor to each participating unit by use of a method [2]. There will always be new challenges technically to increase wind energy penetration in the system [3]. Among dispatchable unit, wind energy poses an intermittent behavior and sometimes, a negative impact on power grid performance which can act as a threat to stability, security and reliability of interconnected power system. Thus, proper balance of dispatchable and non-dispatchable unit is necessary. When employed together with base load unit, the intermittency is prone to imbalance in life cycle of the interconnection and thus, increases operation and costs of maintenance. Conventional unit are generally non dispatchable units and the primary frequency response is evaluated for AGC. Bahmani MH et al. [4] studied a central system for monitoring frequency and power for the Turkish power grid. It inferred the generating units being capable of proportionate behavior fulfilling desired quantitative measures. Thus, battery energy storage systems (BESSs) are a popular unit to be employed for AGC operation which caters to cost saving [5], frequency regulation [6], reliability improvement etc. [7]. In particular, BESSs are also a popular choice for micro grids and smart grids for delivering instant power supply to priority loads. In frequency control studies, many alternative resources such as EVs, consumer loads, battery storages are introduced but the same has not been investigated for deregulated environment with a coordinated control of conventional resources. A proper coordinated multi area power system will provide better control of economic benefits by determining the energy transactions between the interconnected areas [8]. Effective balance of power discharge from BESS is a scenario of utmost importance when in heavy load fluctuations. Again, a charged BESS unit has the capability to discharge at a faster rate than other storage units and can result into instability of power grid [9]. A multi area power system expansion planning model is proposed by considering the generation mix with different reliability indices such as random outages of generators and transmission lines. Moreover, the market clearing mechanism for each area is executed with the provision of preserving the data privacy and minimizing the data exchange [10]. The market clearing mechanism is cleared to get the actual offers and bids for GENCOs and LAs to be submitted for auctioning. The bid generation, evaluation and correction are required for the market participants to maximize the profit of market participants [11]. The auctioning process will consider the bid revision in day-ahead market and real time market to make balance between different areas in an interconnected power system [4]. There is also a rising concern on utilizing environment friendly resources to support power utility. AGC of such resources need rigorous attention to incorporate dispatchable sources such as EVs and BESS which reduces CO2 and NOX emissions. No such study is carried out considering thermal, wind plant, EV and BESS in a coordinated strategy under deregulated environment. Also, the same is not investigated with market based approach for maximizing the profit during bid. The need of the hour is that the proposed strategy needs to manage the dynamics in frequency, tie-power and power generation in an interconnected power network [5]. With the above discussion, the objectives in the present work are framed as follows: (a) To develop a two-area deregulated system with conventional thermal generator, EV and wind-BESS in each area respectively with necessary physical constraints. (b) To investigate the impact of EV and BESS from each area in the deregulated interconnected system.
460
D. Saha et al.
(c) To implement the market clearing mechanism for two areas with load variations in each area and maximizing the profit of respective area by bid revision in auctioning process.
2 Investigation on the Test System An open market environment is simulated with two areas considering three generating sources in each control area with bilateral transactions. EV Aggregators and BESSs are incorporated in each control area to take care of system dynamics under deregulated environment. Each control area consists of a thermal plant, wind plant, aggregate electric vehicles and battery energy storage system. The GENCOs and DISCOs of both areas contract as per the Disco Participation Matrix given by Eq. (1). ⎡
⎤ 0.3 0.4 0.2 0.2 ⎢ 0.2 0.2 0.1 0.4 ⎥ ⎥ DPM = ⎢ ⎣ 0.3 0.1 0.3 0.3 ⎦ 0.2 0.3 0.4 0.1
(1)
Two modes of operation are considered for investigation. In one scenario, EVs are considered as aggregate in each area along with composite BESS in each area. In the other scenario, EV and BESS are utilized together as a separate entity and works on strategy that both the units discharge by sharing half. The Independent System Operator (ISO) checks the area control error (ACE) and then permits different levels of regulation by different conventional and renewable energy sources. Due to change in continuous load demand, the control centre allocates power generation accordingly by the thermal, wind, EV and BESS. Following relationship is maintained in each control areas Eq. (2). PAREA = PThermal + PWIND + PEV /BESS & PEV /BESS = PEV + PBESS (2) where PEV /BESS represents total power discharged by EVs and BESSs. Generation Rate constraint is provided in wind unit and governor dead band is employed in thermal unit which adds up the non-linearity in the test system. Robust proportional integral derivative controller is investigated as secondary controller in both areas to reduce the ACE to zero in each area. Integral Time multiplied by Absolute Error is the performance criteria enacted for minimizing the ACE to approaching zero which is given by Eq. (3). JITAE = ∫(F 1 + F 2 + P tie12 )t.dt
(3)
The above schematic function is modeled in Fig. 1, introduces area wise contribution factors for all production plants with 0.5 as apf11 , apf12 , apf21 , apf22 . The transfer function modeling can be referred from [1] and [8]. PID controller in both areas is tuned by a powerful meta-heuristic algorithm called Harris Hawk Optimization [10]. The transfer function of PID controller output is given by Eq. (4). PID(s) = KP (s) +
KI (s) + sKD (s) s
(4)
AGC Based Market Modeling of Deregulated Power System
461
Fig. 1. Schematic arrangement of two area control system under deregulated environment
Necessary algorithm parameters: Size: 25; Generation: 25; Diffusion: 0.65, and Maximum Diffusion Number: 1. The transaction between area 1 and area 2 will provide the ISO to get better control and provide information to the participants. Also, the objective of the ISO is to maximize the profit of the GENCOs and LAs under-loading conditions and constraints. The AC optimal power flow for the market clearing mechanism for area 1 and area 2 is given as:
N G G Pi,a (5) λk − Ci Pi,a Max a∈A
i∈G
k=1
subject to: G D Pi,a (V , ∂) = Pi,a − Pi,a
Pi,a (V , ∂) = Via
a∈A
N k=1
(6)
Vka (Gik Cos(∂i − ∂k ) + Bik Sin(∂i − ∂k )), ∇i ∈ N (7)
G,min Pi,a
max ≤ PiG ≤ Pi,a &
G,min Vi,a
max ≤ ViG ≤ Vi,a &
G,min ∂i,a
max ≤ ∂iG ≤ ∂i,a
|SFk (V , θ )|2 ≤ Skmax ()2 & |STk (V , θ )|2 ≤ Skmax ()2
(8) (9)
where, Eq. (5) denotes the profit of market participants, Eq. (6)–(9) indicates the real power balancing equations; real power injected at each bus; the limits on generation, voltage magnitude and phase angle and the line flow limits of ‘sending’ and ‘receiving’ bus respectively, The auction for the market participants is cleared in the pool-co market by the Eq. (10) given by:
pjm Pj − djm−1 + pjm djm − djm−1 (10) j
m
where pjm is the offer prices from GENCOs and djm is the power output of GENCOs at each block interval.
462
D. Saha et al.
3 Results and Discussion The following are the two case studies considered for discussing the results. Case I: Introducing Electric Vehicle in the Deregulated Power System Employing Market Based Approach It is expected that when EV aggregators and BESSs participate in AGC for system regulation both on demand side as well as generation side simultaneously. To investigate the phenomenon, conventional units thermal and wind plant in Area 1 and Area 2 is assisted by EVs and BESSs in the deregulated AGC system. The optimum gains and parameters in presence and in absence of EV are tabulated in Table 1. The corresponding system responses are obtained and (only two) are depicted in Fig. 3. Figure 3(a) shows two responses namely frequency change (area 1) vs time; frequency change (area 2) vs. time; tie-power change in area 1 and area 2 vs. time. Figure 3(b) depicts generation profile of all the generations in both control areas in presence of EVs. From Fig. 2(a), the following are noted: maximum overshoot, maximum undershoots and time of settling and, is shown in Table 2. It can be understood from Fig. 2(a) and Fig. 2(b) that PID controller has become successful in frequency and tie-power regulation in both control areas under bilateral transactions. Here, the ITAE values in presence and in absence of EV Aggregators are observed as 0.1387 and 0.1965 respectively. Table 1. Optimum Controller gains’ values and system parameters with 10% SLP in both areas in presence and in absence of EV aggregators Optimum gains
KP1 *
KI1 *
KD1 *
N1 *
KP2 *
KI2 *
KD2 *
N2 *
With electric vehicle
0.6315
0.5308
0.5968
33.361
0.2106
0.8817
0.4081
70.533
Without electric 0.1253 vehicle
0.6654
0.9328
78.345
0.6597
0.5465
0.7963
88.321
Further, for the next study, it is considered that there are composite 5 eV (aggregators) and 3 BESSs (composite) in area 1 and area 2. Each EV aggregators have the capability to govern 100 EVs, and the charging-discharging profile factor of each EV is 2.0 kW/Hz [5]. Maximum SOC = 90%; Minimum SOC = 10%; Rated power = 2 MW; Charging – Discharging profile of BESS = 0.4MW/Hz; Battery capacity of each EV = 24 KWh. Maximum Vehicle to grid power = 10 kW; Initial SOC = 0.3. It is well known from the past studies that 90% time is stated to be idle for an EV. Among which, 90 – 95% of them are under parking zone. Thus, EV participates for deregulated AGC environment for 90% of its idle time. Now, EV and BESS being a dispatchable source react to load demand of 10% in both areas and regulate the frequency, power generation and tie-power. The optimum gains and parameters in presence and in absence of EV are tabulated in Table 3. Corresponding figures are plotted in Fig. 3 (a) and Fig. 3 (b). Performance index (ITAE) is noted as 0.1354. It is evident from the figure that the system is undergoing a stable frequency and
AGC Based Market Modeling of Deregulated Power System
463
Fig. 2. Comparing the system dynamic responses vs. time with 10% SLP to illustrate the impact of utilizing aggregate EV (a) F1 vs time; F2 vs time; Ptie12 vs time (b) Pg vs time with EV Aggregators (c) Pg vs time without EV Aggregators Table 2. Crest Overshoot, Crest Undershoot and settling time of test system obtained responses of Fig. 2 Parameters of F1 responses of (Without Fig. 2(a) EV)
F2
Ptie12
(With EV)
(Without EV)
(With EV)
(Without EV)
(With EV)
POS (+MP)
+0.2
–
0.20
–
0.008
0.001
PUS (−MP)
−0.2
−0.1
−0.22
−0.4
−0.018
−0.008
ST (TS )
8
5
10
4
15
15
464
D. Saha et al.
Table 3. Optimum values of PID Controller gains and parameters with EV and BESS sharing equal contribution when 10% SLP is applied in both control areas. Optimum gains KP1 *
KI1 *
KD1 *
N1 *
KP2 *
KI2 *
KD2 *
N2 *
With EV
0.7754
0.5280
0.9314
30.491
0.7931
0.5354
0.9646
19.458
Without EV
0.5643
0.6675
0.9875
37.342
0.2313
0.5563
0.8976
78.291
Fig. 3. Comparing system dynamic responses vs. time for 10% SLP with aggregate EV and BESS as discharging sources sharing half with each other
AGC Based Market Modeling of Deregulated Power System
465
tie power deviation under 10% SLP in both areas considering co-ordinated strategy of EV and BESS. Case II Implementation of Profit-Based Approach in a Modified IEEE 14 Bus System The validation for the proposed methodology has been studied with modified IEEE case 14 system with a total of 6 generation companies (GENCOs) consist of conventional thermal generators, wind-BESS, EV in each Area and 7 load aggregators (LA) participating in the pool-co market. The modified IEEE case 14 systems is divided into two areas to have effective control in the transacting power and marginal prices of the respective areas. The proposed approach is solved by AC Optimal Power Flow (ACOPF) by linearizing then non-linear equations with mathematical program with equilibrium constraints (MPEC). Table 4. Offers and bids submitted to ISO by GENCOs and LAs GENCOs Quantity submitted (MW)
180
120
90
90
90
110
–
Prices ($/MW)
120
102
90
70
70
72
–
Quantity submitted (MW)
45
6.5
26
3.5
6
13.5
14.5
Prices ($/MW)
120
115
100
85
95
130
120
LAs
The 3 GENCOs from Area 1 and Area 2 respectively are participating in the poolco market with their bids submitted to Independent System Operator (ISO). In pool-co market, the GENCOs submit their bids in increasing order and LA submits their bids in decreased order to ISO. ISO clears the market clearing mechanism in hourly basis. The effect of incremented load in area 1 and area 2 for GENCOs profit are determined. In order to validate the effectiveness of the proposed approach, 2 cases are considered with each area have 10% increase in load. The offers and bids from the GENCOs submitted to ISO can be found in Table 4. In this case, the load increased with 10% in area-1, the effect of incremented load on profit of area-1 and area 2 has been obtained as shown in Fig. 4. The scenarios are thus obtained under case-1 with 10% increase in load in area-1 and 10% increase in bid prices. From the Fig. 4, it is observed that the profit obtained under scenario number 1 is the incremented load in area-1 by 10% while there is no change in the bid prices. While, in scenario number 2 there is incremented change of 10% load with 10% increase in offer price. It can be seen in both the scenarios the profit for area-1 is comparatively higher than area-2 and this due to the transaction in area-1. In scenario number 3, 4 and 5 there is a load increment of 10% in area-1 with simultaneously increase in bid prices of GENCOs and LAs with 10%, 20% and 30% increase for respective scenarios. The Fig. 4 indicates that there are 2.19%, 5.49%, 8.86% increase in profit of area-2 compared to area-1 and this is due to providing provision to LA for bid revision.
466
D. Saha et al.
Fig. 4. Profit (in $) of area-1 and area-2 with 10% load variation in area-1
Fig. 5. Profit (in $) of area-1 and area-2 with 10% load variation in area-2
Thus, it can be concluded that the area profit can be improved with the providing a provision for bid prices revise of GENCOs and LAs. Now, with the load increased with 10% in area-2, the effect of incremented load on profit of area-1 and area 2 has been obtained as shown in Fig. 5. The case 2 scenarios are the same set of scenarios used in case 1 study. The incremented load with 10% in area 2 causes a decrease in profit in area 2 as compared to area 1. The profit for area 2 is found to be decreased with 3.6%, 3.6% and 0.73% for scenario number 1, scenario number 2 and scenario number 3 respectively. In order to maximize the profit for area 2, bid revision policy has been adopted by increasing the bids price with 10% to GENCOs and LAs. The scenario number 4 and scenario number 5 with the revised bid provide a profit increase of 1.54% and 4.6% in area 2 as compared to area 1.
AGC Based Market Modeling of Deregulated Power System
467
4 Conclusion A two area realistic power system which consists of thermal plant, wind plant, EV and BESS unit in each area are taken into consideration for study. The composite system under consideration is investigated with open market scenario under bilateral transactions and the market based approach is also included in an IEEE modified 14 bus system. Two case studies are undertaken with 10% SLP (25 MW in Area 1 and 50 MW in Area 2) in both control areas. The coordination of large-scale EVs, BESSs and wind-thermal system is involved under AGC criteria. Conjunction of EV and BESS is carried out for frequency regulation with the conventional AGC thermal and wind plant. Conjunction is investigated in two cases: one for EV and BESS sharing half of generation and in other case, BESS and EV working independently in each control area. In both the case studies, it is inferred that EV and BESS successfully suppress frequency and tie power dynamics along with other conventional units. The transaction between Area 1 and Area 2 is maximized by considering step change of 10% load in each Area. It can be seen that under constant bids with load variation the profit the profit of Area 2 is decreased significantly which can be increased by simultaneously by providing revised bids. Acknowledgement. We acknowledge infrastructural support from the department of Electrical Engineering, Indian Institute of Engineering Science and Technology Shibpur for executing the research work in the laboratory.
References 1. Saha, D.: AGC of a multi-area CCGT-Thermal power system using stochastic search optimized integral minus proportional derivative controller under restructured environment. IET Gener. Transm. Distrib. 11. https://doi.org/10.1049/iet-gtd.2016.1737 2. Zhong, J., et al.: Coordinated control for large-scale EV charging facilities and energy storage devices participating in frequency regulation. Appl. Energy 123, 253–262 (2014) 3. Broeer, T., Fuller, J., Tuffner, F., Chassin, D., Djilali, N.: Modeling framework and validation of a smart grid and demand response system for wind power integration. Appl Energy 113, 199–207 (2014) 4. Bahmani, M.H., Abolfazli, M., Afsharnia, S., Ghazizadeh, M.S.: Introducing a new concept to utilize plug-in electric vehicles in frequency regulation service. In: 2nd International Conference on Control, Instrumentation and Automation, Shiraz, Iran (2011). https://doi.org/10. 1109/ICCIAutom.2011.6356639 5. Sekyung, H., Soohee, H., Sezaki, K.: Estimation of achievable power capacity from plug-in electric vehicles for V2G frequency regulation: case studies for market participation. IEEE Trans. Smart Grid 2(4), 632–641 (2011) 6. Liang, L., Zhong, J., Jiao, Z.: Frequency regulation for a power system with wind power and battery energy storage. In: IEEE Power System Technology (POWERCON) Conference. Pullman Hotel, Auckland (2012). https://doi.org/10.1109/PowerCon.2012.6401357 7. Ibrahim, H., Ilinca, A., Perron, J.: Energy storage systems – characteristics and comparisons. Renew. Sustain. Energy Rev. 12(5), 1221–1250 (2008) 8. Tripathy, D., Choudhury, N.B.D., Sahu, B.K.: A novel cascaded fuzzy PD-PI controller for load frequency study of solar-thermal/wind generator-based interconnected power system using grasshopper optimization algorithm. Int. J. Electr. Eng. Educ. https://doi.org/10.1177/ 0020720920930365
468
D. Saha et al.
9. Bagheri, A., Jadid, S.: A robust distributed market-clearing model for multi-area power systems. Electr. Power Energy Syst. 124(106275), 1–12 (2021) 10. Zhang, X., Tan, T., Zhou, B., Yu, T., Yang, B., Huang, X.: Adaptive distributed auctionbased algorithm for optimal mileage basedAGC dispatch with high participation of renewable energy. Electr. Power Energy Syst. 124(106371), 1–12 (2021) 11. Panda, R., Tiwari, P.K.: An economic risk based optimal bidding strategy for various market players considering optimal wind placements in day-ahead and real-time competitive power market. Int. J. Syst. Assur. Eng. Manag. 13, 347–362 (2022)
Acute Lymphocytic Leukemia Classification Using Color and Geometry Based Features Sourav Chandra Mandal, Oishila Bandyopadhyay(B) , and Sanjoy Pratihar Department of Computer Science and Engineering, Indian Institute of Information Technology Kalyani, Kalyani 741235, West Bengal, India {sourav phd 2018july,oishila,sanjoy}@iiitkalyani.ac.in
Abstract. Acute Lymphocytic Leukemia (ALL) is a malignant hematological disease. It is also known as acute lymphoblastic leukemia. Being the most common type of childhood cancer, it requires prompt treatment to increase the chances of recovery. There are variant forms or ALL types in the diagnosis procedure, referred to as L1, L2, and L3. Hence, it is possible to apply effective treatment if leukemic cells are identified correctly, i.e., if their proper types are known. Clinical observations say that these subtypes have distinct geometric and color features. This work is mainly focused on classifying ALL type cells. We have used a novel combination of geometric and color-based features for efficient classification. Improved image enhancement and noise removal procedures are also employed as preprocessing. The proposed method has been tested on ALL-IDB data sets. The results corroborate the method’s usefulness in identifying ALL subtypes. Keywords: Acute lymphocytic leukemia machine
1
· Nucleoli · Support vector
Introduction
Leukemia is the cancer of blood cells and bone marrows. It is observed in the adequacy of abnormal white blood cells’ presence in the body. Every year globally 0.35 million individuals die due to leukemia diseases [16]. Leukemic symptoms [14] differ, depending on the type of leukemia. Leukemic cells spread rapidly and affect body immunity system. Early detection of leukemia with its specific type can help doctors to start the treatment and save patient’s life. As per the hematological parameter [15], leukemic cells are partitioned into two classes; (a) Acute type and (b) Chronic type. Acute leukemia spreads with more pace compare to the chronic leukemia. Acute Lymphoblastic Leukemia (ALL) and Acute Myelogenous Leukemia (AML) are two major segments of acute leukemia [4,10]. The difference between healthy cell and leukemia blast cell (ALL and AML) are shown in Fig. 1. On the basis of morphology and cytochemical staining of blasts, French-American-British (FAB) [1] organization, categorizes the ALL c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 469–478, 2022. https://doi.org/10.1007/978-981-19-3089-8_45
470
S. C. Mandal et al.
Fig. 1. (a) Healthy cells, (b–c) Acute Leukemic Cells: (b) ALL type & (c) AML type.
Fig. 2. (a) Healthy cell, (b) Type: L1, (c) Type: L2, (d) Type: L3.
into three subtypes, namely, L1 (30%), L2 (60%), and L3 (10%). These subtypes are differentiated based on Peripheral Blood Smear (PBS) test [5]. Different types of ALL cells are shown in Fig. 2. Clinical observations of the subtypes are summarised below. L1: The blasts are normally analogous and small. The nuclei are symmetric and circular in shape. The number of nucleoli is inconspicuous. Cytoplasms areas are smaller and without vacuoles. L2: The blasts are comparatively larger for this subtype. The nuclei are usually irregular and often divided. But, in most cases, the nucleus is present. Cytoplasm may be variable, but many a time, they contain vacuoles. L3: In this subtype, the blasts are moderately large and homogeneous. Regular and or oval shaped nuclei are present in the cell. Cytoplasms are volume-wise of moderate sizes and may contain prominently visible vacuoles. This work focuses on classifying FAB subtypes of ALL, i.e., L1, L2, and L3. This classification and prognostic factors will help doctors plan for fast treatment. The need for early diagnosis with correct subtypes motivates to design automated ALL classification model. The proposed algorithm applies different noise elimination techniques in pre-processing phase and remove unwanted objects from the Leukemia affected blood smear image. After prepossessing, relevant features are extracted, and subsequently classification of ALL subtypes are followed with the help of multi-class SVM classifier. The paper is organized as stated below. Previous works related to various approaches are discussed in Sect. 2. In Sect. 3, a new methodology has been proposed to classify the ALL cells. Subsequently, data sets and experimental processes are discussed in the experiment section (Sect. 4), and a comparative studies with other existing method are discussed. Lastly, in Sect. 5, the conclusion is drawn and we also discuss about the future scope of research.
Classification of ALL
2
471
Related Works
In recent times, a significant amount of research work has been carried out to automate the diagnostic process of acute leukemia using a computer-aided system. The researchers have proposed several methods to detect and classify leukemic cells from the blood smear image samples. Rawat, J et al. [23] have developed a system for differentiation of healthy and acute leukemic cells using the shape and texture features as the input to the SVM classifier.This system achieves an accuracy of 89.8%. Reta, C. et al. [24] combined different classifiers to achieve accuracy of 90% in subtypes diagnosis. Patel, N. and Mishra, A. [21] extracted different colors, geometry, texture-related features. Then an SVM classifier is used for the final classification. Overall accuracy achieved is 93.57%. Dumyan, S. and Gupta, A. [7] applied Artificial Neural Network (ANN) classifier which works with shape, texture, statistical, and moment invariant features to identify ALL cells. This approach achieved an overall accuracy of 97.9%. Negm, A. et al. [20] used ANN and decision tree classifiers to discriminate the leukemia cells. They have used geometry, color, and relative tissue features. An accuracy of 99.51% has been reported. Rawat, J. et al. [22] used shape, texture and color based features. They used SVM classifier for the recognition of the subtypes. The maximum accuracy of recognition was reported as 97.1% for ALL subtypes. Najaat A et al. [9], targeted to extract the features from both whole-cell and nucleus. Using the Genetic Algorithm (GA) and the SVM kernel parameters, feature subsets are selected, and the model is designed. System noted 99.19% accuracy for the discrimination of healthy and non-healthy cells where as, for ALL subtype classification is 96.84%. Our proposed method mainly focuses on geometry-based and color-based features extraction and classification of the ALL subtypes. The proposed system takes an image with ALL cells as input. The preprocessing stage eliminates noises and segments the target nucleus and cytoplasm. Finally, the relevant features are extracted and used for classification of ALL into different subtypes. The proposed method shows higher accuracy in comparison with other existing methods.
3
Proposed Methodology
The proposed method has four major components - preprocessing, segmentation, feature extraction, and classification. Input objects are taken from the microscopic blood smear images. Phases of the proposed model are shown in Fig. 3. 3.1
Segmentation
The performance of any segmentation technique is often determined by factors such as smear staining, image acquisition conditions and errors in blood smear preparation. Input images may contain noise and blur regions. In pre-processing phase, median filter is applied to enhance [11] the quality of the input image. After this prepossessing task, nucleus-segmentation is performed. Designing an
472
S. C. Mandal et al.
Filtering & Image Enhancement Input image
Preprocessing
Nucleus segmentation Cytoplasm segmentation Segmentation
L1 type 1. Circularity 2. Concavity 15. Nucleoli Feature extraction
S V M
L2 type L3 type
Classification
Fig. 3. Phases of the proposed method.
effective method for shape modeling [3] and segmentation of the nucleus [18] has always been a challenge. Here, color image processing is applied to extract the nucleus from the WBC. The nuclei are represented in RGB color space (as shown in Fig. 4). The darker regions of cells are corresponding to nuclei which are mapped with different color intensity values.
Fig. 4. (a) Original WBC, (b) red channel, (c) green channel, (d) blue channel, (e) segmented nucleus.
3.1.1 Nucleus Segmentation: Image with Multiple Nuclei ALL-IDB dataset have blood smear images with WBCs. In several of these images some nuclei are partially occluded from the frame of image while others are completely in frame. Removal of these partially occluded nuclei boundary from the image is a challenging task [2]. With the help of color segmentation method with different Red, Green, Blue channel bands, all the nuclei present in the cell are segmented. Algorithm 1 extracts the nucleus boundary. The algorithm checks whether nucleus contour touches image boundary or not. If the nucleus contour touches image boundary, that contour part is removed from the original image. If the target nucleus contour also touches image boundary, system explicitly append connected component with maximum contour boundary. The process is shown in Fig. 5(a)–(d). 3.1.2 Cytoplasm Segmentation: Image with Multiple Nuclei In order to extract different features for classification, the cytoplasm with nuclei part is segmented from the input image [25]. As single color segmentation is not sufficient to find the total cytoplasm area, the double color segmentation method is applied to generate two sets RGB values from the image and both are merged accordingly. After prepossessing unwanted cropped cell areas are eliminated. Step wise segmentation, merging, smoothing, filtering and object elimination are shown in Fig. 5(e)–(h).
Classification of ALL
473
Algorithm 1: Target-Nucleus-Boundary-Extraction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Result: Input: Original Image, Output: Segmented Nucleus X ← Connected-Components(Image); V[X] ← Volume(CC[X]); CC[k] = Max(V[X]); [row, col] ← Size(Image); Check for all CC[X]; while p, q ← 1 to row, col do if Image[p][q]) == 1 then if p==1 OR q==1 OR p== row OR q==col then remove CC[X]; end end end if List(CC[X]) == empty then append CC[k]; end
Fig. 5. (a) Input image, (b) extracted nuclei with noise, (c) boundary of the nuclei, (d) target object nucleus, (e) Original Image, (f) cytoplasm with nuclei, (g) smoothed image, (h) target cytoplasm
3.2
Feature Extraction
Significant properties are extracted from the target objects for classification using machine learning where the process reduces large set of data into feature set with a minimum dimension. Extraction of relevant features from the peripheral blood smear is the key step for diagnosis of acute leukemia [17] using this model. Morphological features [13] such as size, cytoplasm density, nuclear chromatin type, number of nucleoli could help for primary screening between healthy vs leukemia cells. To achieve this goal, different types of features have been extracted from the sub-images and fifteen (15) different features are selected
474
S. C. Mandal et al. Table 1. Geometric measures and their definitions.
Measure
Definition
Nucleoli
Nucleoli are small bodies, appear inside the nucleus
Area
Area is defined as the number of pixel in the nucleus or cytoplasm
Perimeter
This feature is measured by the number of pixels present in the contour boundary of the nucleus or WBC cell
Major Axis length
The length (pixel count) of the horizontal line segment with maximum length, intersecting the object fully
Minor Axis length
The length (pixel count) of the vertical line segment with maximum length, intersecting the object fully
Concave points
The points where concavity changes and from concave area cord, it has maximum distance [19]
Eccentricity
It signifies how un-circular the given curve is. Eccentricity can be (M ajor Axis2 + M inor Axis2 ) represented as:
Convexity
Difference between the convex perimeter and original perimeter is called the convexity. By equation Convexity = P erimeterConvex Perimeter
Compactness
The quality of having the pixels fitted into a small space. This is defined as (4 × P i × Area)/P erimeter 2
Rectangularity
The property is the measure of being shaped like a rectangle and defined as Area/(M ajor Axis × M inor Axis)
Circularity
Circularity is the measurement of the roundness of the objects. Major Axis and Minor Axis ratio is considered as roundness
N-C area ratio
Ratio of the nucleus area and the cell area
N-C perimeter ratio Ratio of the nucleus perimeter and the cell perimeter
for classification. The geometry and color based features considered by us are given in the following subsection. 3.2.1 Geometry Based Features Geometric features of the objects are represented by a set of geometric elements. The geometric features used to classify ALL subtypes are listed in Table 1. These features are related to WBC nucleus, WBC cell, and cytoplasm. Geometric definitions of the listed measures are also given in Table 1. 3.2.2 Color Based Features Color features are the basic characteristic of the content of images [6]. Different color based features considered for WBC nucleus are listed below: mean, standard deviation, energy, entropy skewness, andKurtosis (for all 3 channels R, G, & B). Hence, in total, we have 18 color based features. 3.3
Classification
Classification entailed with identification of various types of ALL cells separately. The framework of the classifier is constituted based on the extracted features from the cytoplasm and segmented nucleus. The performance of the classifier is tuned with the selection of the features.
Classification of ALL
475
3.3.1 Support Vector Machine (SVM) SVM is one of the important machine learning algorithm for two-class classification problems. This algorithm can be used for multi-class classification also [8]. An unique feature of SVM is that it concurrently minimizes the error in classification and maximizes the geometric margin. The equation of the hyper-plane with the largest margin that divides n number of the data (in p-dimensional real vector space) into the two classes can be represented as w · x + b = 0. Canonical hyper-plane will always satisfy the constraints (Eq. 1) for all the data set. The hyper-plane with the largest margin is selected as the plane of separation between two classes. yk [wT · xk + b] ≥ 1; k = 1, 2, . . .
(1)
SVM uses a special mapping function θ to map for the training feature vectors xk into a higher dimensions space and a hyper plane with the maximal margin is calculated. This special function K(xk , xj ) ≡ θ(xk ).θ(xj ) is called kernel function. SVM has different types of kernel functions. In our implementation, a linear kernel has been used with the parameter C = 1. 3.3.2 Naive Bayes Classifiers: Naive Bayes classifiers is a supervised classification algorithm based on Bayes theorem of probability to mark the class labels for unknown data sets [26]. Bayes’ theorem is used to obtain the probability of an event, based on the occurrence of, another event that has already occurred. Using Bayes theorem, we determine, P(B|A) from P(B), P(A) and P(A|B) where A is a dependent feature vector of size n and B is class variable. The modified expression is given in Eq. 2. P (B|A1 , ..., An ) =
P (A1 |B)P (A2 |B)...P (An |B)P (B) P (A1 )P (A2 )...P (An )
(2)
Here the probability of the given inputs n for all values of the class variable B are computed as B = argmaxB P (B) i=1 P (Ai |B). The highest probability is selected to create the classification model. Finally, P(B) and P(Ai | B) are computed, where P(B) represents the class probability and P(Ai | B) represents conditional probability. Here, the distribution of P(Ai | B) is the key marker of different Naive Bayes classifiers. The Gaussian distribution is used for this experiment in the Naive Bayes for classifier.
4
Dataset and Experiments
In this work, images are taken from ALL-IDB dataset [12] which is an acute lymphoblastic leukemia image database. The ALL-IDB1 dataset consists of about 39000 blood elements. Expert oncologists have labeled the lymphocytes. This microscopic images have been captured with different magnifying range in between 300 to 500. The ALL-IDB2 is the cropped version of the ALL-IDB1 images, where normal and blast cells are cropped as the region of interests.
476
S. C. Mandal et al.
Table 2. A comparative study of the proposed method’s performances with the various other methods on classifying ALL subtypes. Method
Features
Classifier
Dataset
#Images
Rawat, J. et al. [23]
Texture & shape
SVM
ALL-IDB2
130
Acc. (%) 89.80
Patel, N. et al. [21]
Color, geometry & texture
SVM
ALL-IDB2
27
93.57
Najaat A et al. [9]
Color & shape
SVM
Local Hospital
642
96.84
Dumyan, S. et al. [22]
Shape & texture
ANN
–
Proposed method
Color & geometry
SVM
ALL-IDB2
36
97.10
130
98.40
Hence, gray level property wise ALL-IDB1 and ALL-IDB2 images are same but different only in dimensions. Images which are collected from the healthy individuals, are marked with X = 0 and the images which are collected from ALL patient, are marked with X = 1. Further, ALL blast cells are marked with L1, L2, L3. The ALL-IDB2 contains 260 WBCs out of which 130 cells are blast cell and 130 cells are not blast. 4.1
Results
To evaluate the classification accuracy, two different evaluation metrics are used: Accuracy (Acc.) and F1-score (F1). Respective definitions are given in Eq. 3 where TP, FP, TN, and FN are the different quadrant of the confusion matrix. Acc =
TP TP + FP + FN
F1 = 2 ×
P recision × Sensitivity P recision + Sensitivity
(3)
The discrimination accuracy obtained from FAB subtypes of ALL is 98.4 and F1 score is 0.99 using SVM classifier. With respect to different classifiers, comparative results are shown in Table 2. We found that SVM is performing better classification in our model in comparison with Naive Bayes classifier. Comparative results are shown in Fig. 6.
Fig. 6. Comparison of performances: SVM vs Naive Bayes classifier.
Classification of ALL
5
477
Conclusion
It is a well-known fact that early diagnosis increases the recovery rate in acute lymphocytic leukemia. The results obtained by the developed algorithm show that it is feasible to speed up the diagnosis. This paper has proposed a fully automated method to classify Acute Leukemia with its subtypes. Our contribution lies with the image enhancement and filtering process along with extraction of few more features. Experiments show that the color based and geometry based features combined with the SVM classifier obtained commendable accuracy.
References 1. Abdul-Hamid, G.: Classification of acute leukemia. In: Acute Leukemia. IntechOpen (2011). https://doi.org/10.5772/19848 2. Ahasan, R., Ratul, A.U., Bakibillah, A.S.M.: White blood cells nucleus segmentation from microscopic images of strained peripheral blood film during leukemia and normal condition. In: 5th International Conference on Informatics, Electronics and Vision (ICIEV), pp. 361–366 (2016). https://doi.org/10.1109/ICIEV.2016. 7760026 3. Andrade, A.R., Vogado, L.H., de M.S. Veras, R., Silva, R.R., Araujo, F.H., Medeiros, F.N.: Recent computational methods for white blood cell nuclei segmentation: A comparative study. Comput. Methods Program. Biomed. 173, 1–14 (2019). https://doi.org/10.1016/j.cmpb.2019.03.001 4. Bain, B.J.: A Beginner’s Guide to Blood Cells, 3rd edn. Wiley (2017). https:// www.perlego.com/book/990619/a-beginners-guide-to-blood-cells-pdf. Accessed 15 Feb 2022 5. Bennett, M., et al.: Proposals for the classification of the acute Leukaemias. French-american-british (FAB) co-operative group. Br. J. Haematol. 33(4), 451– 458 (1976). https://doi.org/10.1111/j.1365-2141.1976.tb03563.x 6. Chary, R.V.R.: Feature extraction methods for color image similarity. Adv. Comput.: Int. J. 3(2), 147–157 (2012) 7. Dumyan, S., Gupta, A.: An enhanced technique for lymphoblastic cancer detection using artificial neural network. Int. J. Adv. Res. Comput. Sci. Electron. Eng. (IJARCSEE) 6, 38–42 (2017) 8. Hsieh, S.H., Wang, Z., Cheng, P.H., Lee, I.S., Hsieh, S.L., Lai, F.: Leukemia cancer classification based on support vector machine. In: IEEE 2010 8th IEEE International Conference on Industrial Informatics (INDIN), pp. 819-824 (2010) 9. Ibrahimp, N., Haiderp, M.: Acute leukemia classification based on image processing and machine learning techniques. Int. J. Innovat. Sci. Eng. Technol. 6, 19–31 (2019) 10. Kazemi, F., Najafabadi, T.A., Araabi, B.N.: Automatic recognition of acute myelogenous leukemia in blood microscopic images using k-means clustering and support vector machine. J. Med. Signals Sens. 6, 183–193 (2016) 11. Kumar, N., Nachamai, M.: Noise removal and filtering techniques used in medical images. Indian J. Comput. Sci. Eng. 3, 146–153 (2012) 12. Labati, R.D., Piuri, V., Scotti, F.: All-IDB: the acute lymphoblastic leukemia image database for image processing. In: 18th IEEE International Conference on Image Processing (ICIP), pp. 2045–2048 (2011)
478
S. C. Mandal et al.
13. Ladines-Castro, W., Barrag´ an-Iba˜ nez, G., Luna-P´erez, M., Santoyo-S´ anchez, A., Collazo-Jaloma, J., Mendoza-Garc´ıa, E., et al.: Morphology of leukaemias. Rev. M´ed. Hospit. Gener. M´exico 79, 107–113 (2016). https://doi.org/10.1111/j.13652141.1976.tb03563.x 14. Levy, L., Nasereddin, A., Rav-Acha, M., Kedmi, M., Rund, D., Gatt, M.: Prolonged fever, hepatosplenomegaly, and pancytopenia in a 46-year-old woman. PLoS Med. Art. ID: 19365537 (2009). https://doi.org/10.1371/journal.pmed.1000053 15. Lim, G.C.C.: Overview of cancer in Malaysia. Jpn. J. Clin. Oncol. Art. ID: 11959876 32 (2002) 16. Lin, X., et al.: Global, regional, and national burdens of leukemia from 1990 to 2017: a systematic analysis of the global burden of disease 2017 study. Aging (Albany NY) 13(7), 10468–10489 (2021). https://doi.org/10.18632/aging.202809 17. Madhloom, H.T., Kareem, S.A., Ariffin, H.: A robust feature extraction and selection method for the recognition of lymphocytes versus acute lymphoblastic leukemia. In: International Conference on Advanced Computer Science Applications and Technologies (ACSAT), pp. 330–335 (2012) 18. Makem, M., Tiedeu, A.: An efficient algorithm for detection of white blood cell nuclei using adaptive three stage PCA-based fusion. Inform. Med. Unlock. 20, 100416 (2020). https://doi.org/10.1016/j.imu.2020.100416 19. Mandal, S.C., Bandyopadhyay, O., Pratihar, S.: Detection of concave points in closed object boundaries aiming at separation of overlapped objects. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds.) CVIP 2020. CCIS, vol. 1378, pp. 514–525. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-11032 43 20. Negm, A.S., Hassan, O.A., Kandil, A.H.: A decision support system for acute leukaemia classification based on digital microscopic images. Alex. Eng. J. 57(4), 2319–2332 (2018) 21. Patel, N., Mishra, A.: Automated leukaemia detection using microscopic images. Proc. Comput. Sci. 58, 635–642 (2015) 22. Rawat, J., Singh, A., Bhadauria, H., Virmani, J., Devgun, J.S.: Computer assisted classification framework for prediction of acute lymphoblastic and acute myeloblastic leukemia. Biocybern. Biomed. Eng. 37, 637–654 (2017) 23. Rawat, J., Singh, A., Bhadauria, H., Virmani, J.: Computer aided diagnostic system for detection of leukemia using microscopic images. Proc. Comput. Sci. 70, 748–756 (2015). https://doi.org/10.1016/j.procs.2015.10.113 24. Reta, C., et al.: Segmentation and classification of bone marrow cells images using contextual information for medical diagnosis of acute leukemias. Plos One 10, e0130805 (2015). https://doi.org/10.1371/journal.pone.0130805 25. Sarrafzadeh, O., Dehnavi, A.: Nucleus and cytoplasm segmentation in microscopic images using k-means clustering and region growing. Adv. Biomed. Res. 4, 174 (2015). https://doi.org/10.4103/2277-9175.163998 26. Selvaraj, S., Kanakaraj, B.: Na¨ıve Bayesian classifier for acute lymphocytic leukemia detection. J. Eng. Appl. Sci. 10, 6888–6892 (2015)
Thermal Strain Resolution Improvement in Brillouin OTDR Based DTS System Using LWT-MPSO Technique Ramji Tangudu(B) and P. Sirish Kumar Department of ECE, Aditya Institute of Technology and Management, Tekkali 532201, India [email protected]
Abstract. In this work, a lifting wavelet transform-modified particle swarm optimization (LWT-MPSO) scheme applied for improving the thermal strain resolution, and sensing range in a Brillouin optical time-domain reflectometry (Brillouin OTDR) based distributed temperature sensing (DTS) system. In the proposed DTS system, thermal strain resolution obtained using Brillouin frequency shift (BFS), and Brillouin power change (BPC) parameters. To improve the thermal strain resolution of the proposed DTS system, minimizing the error in BFS and BPC is required. The proposed scheme (LWT-MPSO) used to minimize the BFS, and BPC errors for providing higher temperature measurement accuracy over a sensing fiber length of 10 km. The average thermal strain resolution obtained in this work is ~17.66 με. The relevant results are obtained using MATLAB 15.0 simulation. Keywords: Brillouin optical time domain reflectometry (Brillouin OTDR) · Brillouin frequency shift (BFS) · Brillouin power change (BPC) · Thermal strain resolution; lifting wavelet transform (LWT) · Modified particle swarm optimization (MPSO)
1 Inroductıon Currently, the fiber optic cables are highly useful for both communication, and sensing applications. In the distributed fiber optic sensing (DFOS) system, Rayleigh, Brillouin, and Raman backscattered signals are helpful to detect vibration, strain, and temperature at any position of sensing fiber [1, 2]. Distributed temperature sensing (DTS) system is coming under the DFOS system. The DTS system has high potential applications, such as power cables monitoring, fire detection, oil & gas pipeline leakage detection, medical areas, etc. [3]. The significant features of the DTS system are less complexity, less weight, high speed, high security, immunity to electromagnetic interference, and more survivability under harsh environmental conditions [3, 4]. When there is a deviation in temperature along with the sensing fiber, the thermal strain will be generated along with the sensing fiber. Brillouin frequency shift (BFS) and Brillouin power change (BPC) produced [5] due to temperature, and thermal strain deviations on the sensing fiber,. These parameter variations are beneficial for extracting the sensing fiber’s temperature © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 479–486, 2022. https://doi.org/10.1007/978-981-19-3089-8_46
480
R. Tangudu and P. S. Kumar
values. Typically the value of BFS, and BPC are ~11 GHz, and 0.36% for a 1.55 μm of pumping wavelength under room temperature [5]. The Brillouin scattering-based DTS system has two significant categories per the literature survey. These categories are namely called spontaneous Brillouin DTS and stimulated Brillouin DTS systems. In the stimulated Brillouin scattering-based DTS system, double-ended access is required. High non-linearity effects are generated [6– 8]. The spontaneous Brillouin scattering-based DTS system presents mainly two types: Brillouin optical time-domain reflectometry (Brillouin OTDR), and Brillouin optical correlation domain reflectometry (BrillouionOCDR). The Brillouin OCDR based DTS system is only used for meters-long sensing fiber [9–11]. The performance of the Brillouin OTDR based DTS system is developed and analyzed in this article. The paper reports the analysis of BFS and BPC errors of the proposed system. In order to get a higher degree of temperature resolution, there is need to obtain better thermal strain resolution. In order to get a better thermal strain resolution, we have applied the lifting wavelet transform (LWT), modified particle swarm optimization (MPSO) signal processing technique. Here, the lifting technique [12] does not have difficult mathematical computations. Lifting technique has only three operations. Such as, split, predict and update [12]. Hence, this technique produces an output with a high speed. MPSO is useful to obtain the optimum threshold using [12] evolutionary algorithm. The resultant threshold can give an optimum solution. The MPSO algorithm connects with a velocity parameter only. This evolutionary algorithm has an adaptive nature in an inertia weight parameter for avoiding local optimum. It has a very few control parameters, and it gives the better convergence for both linear and nonlinear signals. The PSO algorithm does not have an adaptive nature in an inertia weight parameter. Hence, MPSO gives better results than PSO algorithm.
2 Theory Figure 1 shows the illustration of the Brillouin OTDR based DTS system setup. In this setup, the laser diode acts as an optical source. It is useful to generate the pulsed optical signal using pulse generator. These optical pulses are propagated through the fiber. This fiber connected to the 3-port circulator. The light pulses come from the 1st port of circulator and passes through the sensing fiber or fiber under test (FUT). When the light signal passes through the sensing fiber, the backscattered optical signal will produce along entire sensing fiber. Here, the backscattered signal is called as the spontaneous Brillouin backscattered signal (SpBS) that is generated from an interaction between launched optical pulses and acoustic phonons in the FUT. Here, the launched optical signal’s frequency is different from the SpBS’s frequency. This frequency difference is known as BFS. When the pumping wavelength is 1.55 μm, the value of BFS for silica single-mode fiber (SMF) is ~11 GHz under room temperature. This BFS depends upon the temperature variation. In Fig. 1, three heating units are available at different locations on the sensing fiber. The Brillouin backscattered signals in this FUT are propagated towards the 3-port circulator. The optical signal from the third port of the circulator goes to the avalanche photo detector (APD) device. Here, an optical signal can be converted into
Thermal Strain Resolution Improvement in Brillouin OTDR Heating Unit 1 (HU 1)
Laser Diode
1
Circulator 3
Heating Unit 2 (HU 2)
Heating Unit 3 (HU 3)
2 2.00 km to 2.75 km
4.50 km to 5.50 km
481
8.50 km to 8.75 km
Cleaved End
Fiber under test (FUT) length of 10 km Pulse Generator
Avalanche Photo Diode
Digital Storage Oscilloscope
Personal Computer
Fig. 1. Illustrates the Brillouin OTDR based DTS system.
an electrical signal using an APD. This electrical signal can be viewed in the digital storage oscilloscope (DSO). Finally, an electrical signal comes from the DSO, and it can be operated in the personal computer. When the temperature applied on the FUT, BFS (ϑB (T )) can be generated. It can be expressed as [5, 13]: ϑB (T ) = ϑB (Tr ) + KT ϑ (T − Tr )
(1)
Due to the effect of temperature on the fiber, thermal strain can be produced along with the fiber. Due to this thermal strain, BFS (ϑB (ε)) can be produced. It can be expressed as [5, 13]: ϑB (ε) = ϑB (0) + Kε ϑ (ε − 0)
(2)
Due to the temperature, and thermal strain, the total BFS (ϑB (T , ε)) is expressed by [5, 13]: ϑB (T , ε) = ϑB (Tr ) + KT ϑ (T − Tr ) + Kε ϑ (ε − 0)
(3)
Since KT ϑ and Kε ϑ parameters indicate temperature and thermal strain coefficients of Brillouin frequency shift in a single mode sensing fiber. Here, the sensing fiber considered as the silica sensing fiber. T and Tr parameters indicate non-room and room temperatures. The parameter ε indicates non-zero strain. BPC is indicated by PB (T ) and it is generated by the temperature applied on the FUT. It can be expressed as [13]: PB (T ) = PB (Tr ) + KT P (T − Tr )
(4)
482
R. Tangudu and P. S. Kumar
BPC PB (ε) is produced by the thermal strain and the FUT. It can be written as [13]: PB (ε) = PB (0) + Kε P (ε − 0)
(5)
Due to the temperature and thermal strain, the total Brillouin power change PB (T , ε) is expressed by [13]: PB (T , ε) = PB (Tr ) + KT P (T − Tr ) + Kε P (ε − 0)
(6)
Since KT P and Kε P parameters indicate temperature and thermal strain coefficients of Brillouin power change in a sensing fiber. T and Tr parameters indicate non-room and room temperatures. The parameter ε indicates non-zero strain. Thermal strain resolution δε of any Brillouin based DFOS system can be estimated by [13]: P (7) δε = (KT P δϑ + KT ϑ δP )/(KT ϑ K ε − Kε ϑ KT P ) Here, δϑ and δP indicate root mean square (RMS) of BFS, and BPC for the entire sensing fiber. The Table 1 shows the values of KT ϑ , Kε ϑ , KT P , and Kε P parameters for SMF. Table 1. Silica SMF: temperature and thermal strain coefficients Coefficients
Values
KT ϑ Kε ϑ KT P Kε P
1.07 ± 0.06 MHz/◦ C 0.048 ± 0.004MHz/με 0.36 ± 0.03%/◦ C −0.0009 ± 0.00001%/με
The relationship between temperature and thermal strain is expressed by Eq. (8) [11, 12]: ε = 26.83314TK − 7862.111
(8)
Here, ε indicates thermal strain which is in με, and TK indicates temperature, which is in K. In the developed approach, the LWT technique helps to identify the both approximated and thorough wavelet coefficients of BFS error and BPC error signals. In this technique, a modified differential threshold function is effective to reduce the noisy lifting wavelet coefficients of the BFS error , and BPC error signals.To acquire a better quantity of reduction in the noise peaks of the BFS error , and BPC error signals, it is compulsory to optimize the modified differential thresholding function. We have employed the MPSO evolutionary algorithm for getting the optimum modified differential threshold (OMDT) function. The MPSO methods objective functions are the gradient of mean square error (MSE) of BFS error and BPC error signals.
Thermal Strain Resolution Improvement in Brillouin OTDR
483
Our previous contributed work describes the mathematical background in relating to the proposed scheme [14]. The proposed evolutionary algorithm has been employed at each level of LWT mechanism in the proposed technique. Here, the total number of levels is 3. Similarly, the proposed method is applied to BPC error [n] signal for minimizing the significant amount of error in BPC error [n] signal.
3 Results and Discussions The assumed parameters for the proposed DTS system in simulation are: 10 mW of an optical power (P0 ), 1.48 of refractive index of sensing fiber core (n1 ), 1.55 μm of launching wavelength (λ0 ), and the 15 ns of pulse width (τ0 ). From this pulse width, the 1.5 m of spatial resolution can be obtained, and the resultant number of sensing points for 10 km of FUT or sensing fiber length is 6667. From Fig. 1, we can observe that three different heating units (HU) on the FUT. These HUs are useful for generating the temperature. Their positions on the FUT can be observed from Fig. 1. The corresponding temperatures are 63.62 ◦ C, 65.62 ◦ C, and 55.88 ◦ C, respectively, and the remaining part of FUT has 47 ◦ C of temperature. In Fig. 2, BFS-RMS for an entire sensing fiber length is shown. Here the fitting curves for BFS-RMS without any technique and the proposed technique are plotted. Since these three fitted curves are called linear, quadratic, and cubic, these curves are estimated or derived using MATLAB’s interpolation method. Among these three fitting curves, the cubic fitting curve results in optimal results. Without any technique (Normal)
3
Without any technique (Linear fitting) Without any technique (Quadratic fitting) Without any technique (Cubic fitting)
2.5
With LWT-MPSO technique (Normal) With LWT-MPSO technique (Linear fitting) With LWT-MPSO technique (Quadratic fitting) With LWT-MPSO technique (Cubic fitting)
BFS-RMS (MHz)
2
1.5
1
0.5
0 0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Sensing length (m)
Fig. 2. Root mean square of Brillouin frequency shift for entire sensing fiber length.
From Fig. 3, we can observe that three fitting curves for BPC-RMS without any technique and with proposed technique cases are obtained using the interpolation method
484
R. Tangudu and P. S. Kumar
using MATLAB. These RMS values are affected by both temperature and thermal strain. Figure 2 and 3 comes from the Eq. (1) to (6). Through BFS, the value of thermal strain is estimated for the entire range of the sensing fiber, which can be observed from Fig. 4. In this figure, the y-axis represents the actual thermal strain and the estimated thermal strain along with the thermal strain resolution for a given sensing fiber length. Without any technique (Normal)
0.4
Without any technique (Linear fitting) Without any technique (Quadratic fitting) Without any technique (Cubic fitting)
0.35
With LWT-MPSO technique (Normal) With LWT-MPSO technique (Linear fitting) With LWT-MPSO technique (Quadratic fitting)
0.3
With LWT-MPSO technique (Cubic fitting)
BPC-RMS (%)
0.25
0.2
0.15
0.1
0.05
0 0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Sensing length (m)
Fig. 3. Root mean square of Brillouin power change for entire sensing fiber length. 600
Without any technique With LWT-MPSO technique
500
Thermal Strain (
)
400
300
200
100
0
-100
0
1000
2000
3000
4000
5000
6000
7000
8000
9000 10000
Sensing length (m)
Fig. 4. Extraction of thermal strain for entire sensing fiber length.
Thermal Strain Resolution Improvement in Brillouin OTDR
485
The thermal strain resolution values with the proposed technique and without any technique are shown in Fig. 5, extracted using BFS-RMS and BPC-RMS values as shown in Fig. 2 and Fig. 3. In this Fig. 5, we have plotted the fitting curve (cubic) for the above two cases. Compared to linear and quadratic, the cubic fitting curve offers better results. From Fig. 5, the value of average thermal resolution for without any technique case is ~50.66 με, and with the proposed technique case is ~17.66 με. It indicates a ~33 με improvement in thermal strain resolution. Figure 5 comes from the Eq. (7).
Thermal strain Resolution (
)
70
Without any technique (Normal) Without any technique (Cubic fitting) With LWT-MPSO technique (Normal) With LWT-MPSO technique (Cubic fitting)
60
50
40
30
20
10
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Sensing length (m)
Fig. 5. Thermal strain resolution in an entire sensing fiber length.
4 Conclusion In this work, we have shown the thermal strain resolution enhancement in the proposed DTS system. The proposed scheme applied on the proposed DTS system for obtaining an average BFS RMS value of ~0.64 MHz and an average BPC RMS value of ~0.1%. We have obtained an average thermal strain resolution of ~17.66 με over 10 km of sensing fiber length from these average RMS values. Hence, the achieved thermal strain resolution can provide a better temperature resolution. The proposed DTS system is simulated using MATLAB 15.0 version software. Hence, the proposed DTS system can be highly applicable in fire detection, power cables monitoring, oil &gas pipeline leakage detection, medical areas, etc.
References 1. Posey, R., Johnson, G.A., Vohra, S.T.: Strain sensing based on coherent Rayleigh scattering in an optical fibre. IEEE Elect. Lett. 36(20), 1688–1689 (2000)
486
R. Tangudu and P. S. Kumar
2. Peng, F., Wu, H., Jia, X.H., Rao, Y.J., Wang, Z.N., Peng, Z.P.: Ultra-long high-sensitivity -OTDR for high spatial resolution intrusion detection of pipelines. J. Opt. Exp. 22(11), 13804–13810 (2014) 3. Lee, B.: Review of the present status of optical fiber sensors. J. Opt. Fiber Technol. 9(1), 57–79 (2003) 4. Gunday, A., Yilmaz, G., Karlik, S.E.: Spontaneous Raman power and Brillouin frequency shift method based distributed temperature and strain detection in power cables. J. Opt. Fiber Technol. (2008) 5. Yu, J.W., Park, Y., Oh, K.: Brillouin frequency shifts in silica optical fiber with the double cladding structure. J. Optic Exp. 10(19), 996–1002 (2002) 6. Li, Z., et al.: Coherent BOTDA sensor with single-sideband modulated probe light. J. IEEE Photon. Technol. 8(1), 1–8 (2016) 7. Minardo, A., Bernini, R., Zeni, L.: Distributed temperature sensing in polymer optical fiber by BOFDA. IEEE Photon. Technol. Lett. 26(4), 387–390 (2014) 8. Song, K.Y., Hotate, K.: Brillouin optical correlation domain analysis in linear configuration. IEEE Photon. Technol. Lett. 20(24), 2150–2152 (2008) 9. Hayashi, N., Mizuno, Y., Nakamura, K.: Simplified configuration of Brillouin optical correlation-domain reflectometry. J. IEEE Photon. Technol. 6(5), 1–7 (2014) 10. Tangudu, R., Sahu, P.K.: Strain resolution and spatial resolution improvement of BOCDR based DSS system using particle swarm optimization algorithm. In: Janyani, V., Singh, G., Tiwari, M., d’Alessandro, A. (eds.) Optical and Wireless Technologies. Lecture Notes in Electrical Engineering, vol. 546, pp. 179–192. Springer, Singapore (2019). https://doi.org/10. 1007/978-981-13-6159-3_20 11. Tangudu, R., Sahu, P.K.: Temperature resolution and spatial resolution improvement of BOCDR based DTS system using particle swarm optimization algorithm. In: Bansal, J., Das, K., Nagar, A., Deep, K., Ojha, A. (eds.) Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol. 817, pp. 781–792. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-1595-4_62 12. Xu, X.: Single-pulse threshold detection method with lifting wavelet denoising based on modified particle swarm optimization. J. Infrared Phys. Technol. 88, 174–183 (2018) 13. Gunday, A., Karlik, S.E., Yilmaz, G.: Analysis of effects of Young modulus variations on Brillouin power and Brillouin frequency shift changes in optical fibers. J. Opt. Elect. Adv. Mat. 18(11–12), 1000–1006 (2016) 14. Tangudu, R., Sahu, P.K.: Strain resolution enhancement in Rayleigh-OTDR based DSS system using LWT-MPSO scheme. J. Elsevier Optik 176(1), 102–119 (2018)
Wrapper Based Feature Selection Approach Using Black Widow Optimization Algorithm for Data Classification Himanshu Dutta, Mahendra Kumar Gourisaria, and Himansu Das(B) School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar 751024, Odisha, India [email protected]
Abstract. Feature selection is a crucial preprocessing step for data analysis and further predictive modeling tasks. Various optimization algorithms have been used in the past to achieve better results in terms of evaluation metrics like classification accuracy by selecting an optimal subset of features from a given dataset. In this paper, we propose a wrapper-based method that builds upon the idea of the Black Widow Optimization (BWO) technique, along with certain changes to the originally proposed algorithm to improve its performance. The results of the proposed method have been compared with the typically used optimization algorithms for the task of feature selection, such as Feature Selection based on Genetic Algorithm (FSGA) and Feature Selection based on Particle Swarm Optimization (FSPSO), over different well-known classification algorithms. The results show significant improvement in terms of obtaining better classification accuracy and selecting a smaller and more optimal subset of features, as well. Further, it is inferred from the results that FSBWO converges to the global best score faster than other competing algorithms, hence making it a favorable choice with large datasets where the processing time can be of crucial importance. Keywords: Black Widow Optimization · Classification Optimization · Feature selection · Wrapper method
1
·
Introduction
The world is moving towards better ways to leverage the underlying intelligence present in data. Day in, day out, billions and trillions of bytes of data is being produced, but just extracting that data is not the end of it. As goes the hierarchy of extracting information from data and knowledge from information, it is really crucial to understand and select the data with high context and relevance to the task in hand. The biggest challenges that models face when employed with such task, is that they either overfit to the data and have a very minimal variance or c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 487–496, 2022. https://doi.org/10.1007/978-981-19-3089-8_47
488
H. Dutta et al.
the irrelevance of the given data to the target variable. The number of features present in the data affects both these factors. If the given data is a mixture of both relevant and irrelevant features or redundant features, then it results in increased computation cost and a huge penalty on the performance. In the other scenario, considering that the features in high-dimensional data are relevant to predictive modeling, there is a good chance that the model would run into the problem of overfitting, hence leading to poor generalization which in turn means bad performance on unseen examples. Hence it is extremely important to find an optimal subset of features and reduce the dimensionality of data. This in turn reduces the computation cost as well as gets better performance. The task of feature selection has been popularly posed as an optimization problem, and different classes of optimization algorithms, from Nature-inspired to Swarm Intelligence based algorithms, have been used to solve the task. Oluleye et al. [1] proposed a Genetic Algorithm based approach to feature selection, Parham Moradi [2] proposed a feature selection method based on Particle Swarm Optimization along with a strategy to enhance local search near global optima. Similarly, various other feature selection methods have been proposed based on different optimization algorithms (Dash and Liu, [3]; Jain and Zongker, [4]; Chandrashekar and Sahin, [5]). The problems faced with these and most other methods based on similar optimization algorithms is the huge number of parameters that need to be tuned, in order to realize a good performance. Since a lot of these algorithms are nature-inspired, there is another aspect to this, the proportionality of the number of iterations and the attainment of optimal value, i.e., the more the iterations the better value is attained by the feature selection algorithm. In algorithms like Simulated Annealing(Laarhoven and Aarts, [6]) which favors exploration over exploitation initially, tend to take longer to converge. This factor won’t be of consideration if the optimization problem would be rather cheap in terms of computation, but when the problem in hand consists of different predictive models for which optimal subset of features are to be selected then computation cost increases significantly, and hence naturally one would choose an optimization algorithm which could achieve optimal value in lesser iterations. Das H. et al. [7–9] propose to use Artificial Electric Field Algorithm, Teaching-Learning-Based Optimization Algorithm, Jaya algorithm for the task of feature selection. The approach proposed in this paper, Feature Selection based on Black Widow Optimization (Hayyolalam and Kazen, [10]) is a wrapper-based method, which aims to solve the above-mentioned problems. It aims to attain optima in fewer iterations when compared to other standard approaches. This paper presents a thorough comparative analysis of similar wrapper-based standard feature selection algorithms, namely, FS-Genetic Algorithm (FSGA), FSDifferential Evolution (FSDE), FS-Particle Swarm Optimization (FSPSO), and FS-Simulated Annealing (FSSA), over widely used classification algorithms, kNearest Neighbour (KNN), Naive Bayes (NB), Logistic Regression (LR) and Decision Tree (DT) on eight benchmark datasets with diverse dimensionality, resulting in a more accurate and thorough experimental investigation of all the mentioned approaches.
Feature Selection Using FSBWO Algorithm
2
489
Feature Selection-Black Widow Optimization
The problem of feature selection can be stated as, the selection of an optimal subset of features from a given data, so as to improve the predictive model’s performance according to the decided evaluation metric. As already highlighted in the introduction, the selection of the optimal subset is an NP-Hard problem and hence it is of exponential complexity along with the added expense of evaluation of the subsets over the predictive model, but it can be posed as an optimization problem, hence decreasing the overall complexity of the problem. The common idea that each algorithm supports is that of exploration and exploitation of a search space in all the dimensions of the problem and finding an optimal value for them, so as to attain the best possible fitness score. Hence we formulate the problem of feature selection as follows: (1) Each feature of the data is considered to be a dimension of the problem. (2) The bound for each dimension is from 0 to 1, where 0 signifies that the particular feature is least significant for predictive modeling and 1 represents the highest possible significance. (3) The nature of the problem is considered to be continuous and non-linear. (4) The goal is to find the optimal value for each dimension, i.e., estimating the significance of each feature such that the evaluation metric has the best possible value for it. We pose this as a minimization problem rather than maximization, and hence rather than maximizing the classification accuracy, we minimize the loss as formulated in Eqs. (1 and 2). f (x) = min(loss(d))
(1)
x = |D|, d ⊆ D
(2)
Here D is the set of all the features, and f(x) is the fitness value of the subset d. Notice that the formulation of the problem takes into consideration all the features together, as well as keeps track of the interdependence of features on each other. In this way, the complexity of the problem remains constant and doesn’t change with the change in the problem dimension. The dataset is divided into training and testing, where the training data is used for selecting the optimal subset of features, and the testing data is used to evaluate the instances with only the subset of features selected by the optimization algorithm. Our contribution in this paper is the design of a wrapper-based feature selection method based on the Black Widow Optimization algorithm, along with certain modifications to the procreation step of the original algorithm. The original BWO algorithm has the following steps over the iterations: Selection, Procreation, Cannibalism, Mutation, Evaluation, and Updation, along with the initialization of the population. This leads to a better and more fit population each time the process is repeated, and hence after a number of iterations of this process, only the best possible individuals remain alive. The mating of two individuals in the population takes place according to the following formula: (1) y1 = α × x1 + (1 − α) × x2 , (2) y2 = α × x2 + (1 − α) × x1 . Here, α is a vector of random values, of the same dimension to that of the problem, i.e., Nvar . The original algorithm proposes to select individuals for mating at random, and it
490
H. Dutta et al.
Fig. 1. Flow of FSBWO algorithm
proposes to repeat the mating process Nvar times, where Nvar is the dimension 2 of the problem. This leads to two problems. Firstly, the idea of always selecting the best from the population would be only dependent on the cannibalism step, and second, a much considerable concern is that the population size must always be greater than or equal to the problem dimension, which may, in turn, lead to increased computation cost, hence we proposed two changes in the procreation step, which are as follows: (1) Use of k-way tournament selection in choosing the pair of individuals to mate, (2) Mating all the pairs in the population, i.e., time, so as to get a total of Npop offspring, where Npop is the iterating Nvar 2 total population size. This way we ensure the procreation of the most suitable individuals with each other along with the generation of a suitable offspring for the cannibalism step. The details of the proposed FSBWO model are explained in the following text. An initial population of random individuals of specified population size is generated, where each individual is of the same dimension as the number of features in the data, hence the dimension of the population is, Npop × Nvar , where Npop is the number of individuals in the population, and Nvar is the number of features in the data. The dataset is divided into two subsets, namely, the training and testing set, and the training set is used to evaluate the performance of the selected subset of features in each iteration. A threshold value is set such that: ni = 1 if ni > threshold, else 0 where 0 < i < Nvar . This means that a certain feature would be considered to be in the subset if and only if the value for that feature is greater than the threshold value. The fitness value of an individual is calculated as the classification error by, passing the fea-
Feature Selection Using FSBWO Algorithm
491
tures that hold a value greater than the threshold to the classification algorithm, namely, KNN, DT, LR, and NB. Further, the offspring perform the cannibalism step, by reducing the population size and including only the top Npop individuals based on their fitness score. The algorithm of the proposed method is shown by Algorithm 1 and flow of the algorithm is shown in Fig. 1. Algorithm 1: FSBWO population ← initialize a random population of Npop × Nvar dimension procreation count ← Npop × procreation probability training set, test set ← SPLIT-DATA(data, split size) for i ← 1 to Npop do fitness[i] ← FITNESS-FUNC(population[i], training set) end repeat new population ← EMPTY SET new fitness ← EMPTY SET for i ← 1 to procreation count do male, female ← KWAY-TOURNAMENT-SELECTION (population, fitness) children ← generate Npop /2 children add children to new population add female to new population end for i ← 1 to new − population − size do new fitness[i] ← FITNESS-FUNC(new population[i], training set) end population ← SELECTION-FUNC(new population, new fitness, Npop ) for i ← 1 to population − size do population[i] ← MUTATION-FUNC(population[i]) end until termination-condition;
3
Experiment and Result Analysis
The evaluation of the proposed method, FS-BWO, is done in a comparative manner, where we evaluate the merit of the method based on two factors. The first factor is crucial to any predictive modeling task, i.e., a single point evaluation metric, which helps understand how well the model has trained. For the sake of comparison on common terms, we chose classification accuracy as the metric of
492
H. Dutta et al.
evaluation. The reason behind this choice was the comprehendible nature of metric both for us, as well as, the task in hand. Further, it supports the formulation of the problem well, as we are posing the task as a minimization problem, we can easily think of improvement of classification accuracy as the opposite of minimization of classification inaccuracy, i.e., f (x) = min(loss(d)) = min(1−acc(d)). The second factor of consideration in this comparison is the cardinality of the subset derived by the proposed method, i.e., the number of features that it finds to be of significance. Since the larger the number of features, the more is the computation time, hence we aim to derive a subset of features that is optimal for the task, for a given predictive model, alongside having minimal cardinality. Table 1. Experiment parameters
Parameter
DE GA PSO SA
BWO
Population size
20
20
20
20
20
Number of iterations 100 100 100
100
100
Scaling factor
–
–
–
Crossover probability 0.8
0.6 –
–
0.6
Mutation ratio
–
0.3 –
–
0.5
Particle coeff. (c1)
–
–
0.45 –
–
Swarm coeff (c2)
–
–
0.55 –
–
Inertial weight (w)
–
–
0.5
–
–
Initial temperature
–
–
–
0.8
–
Final temperature
–
–
–
0.001 –
Cannibalism rate
–
–
–
–
0.85 –
0.44
For the experiment, we collected eight standard datasets with a varying number of features, ranging from 6 to over 500. The datasets show huge variation in all respects. These standard datasets include both, balanced and imbalanced data, binary classification tasks with the number of examples ranging from hundreds to thousands, along with both real-world data and synthetic data. Datasets like Titanic, which have been part of competitions, show the importance of preprocessing steps in the whole machine learning lifecycle, which would be affirmed by the change in classification accuracy by application of feature selection methods. The reason to collect datasets with such variations was to test the proposed feature selection method in a variety of situations. The proposed FS algorithm is compared with other widely recognized and used algorithms, FS-Genetic Algorithm (FSGA), FS-Differential Evolution (FSDE), FS-Particle Swarm Optimization (FSPSO), and FS-Simulated Annealing (FSSA), and the metric used for this comparison is the classification accuracy achieved by these algorithms. Since all the algorithms are stochastic in nature, each of the algorithms is subjected to
Feature Selection Using FSBWO Algorithm
493
a total of ten runs, and the average of those runs is used for the comparison, rather than comparing single runs. Each algorithm has its own set of hyperparameters, but most have two in common, namely, epochs/generations, and the population size (addressed with different names in respective algorithms). The hyperparameters have been mentioned in Table 1. For the small datasets and medium datasets, it was observed that under all the circumstances, on average FSBWO attained the highest accuracy score. Along with that, it was noted that for all of the classification algorithms, when we compared the number of features, FSBWO always was able to select the least number of features from the entire set. This clearly implies that for datasets with a feature-set of this scale, FSBWO outperforms all the classically used FS algorithms.
Fig. 2. Loss curves of DT, KNN, LR, and NB classifiers on SPECTF dataset
Lastly for the large datasets, the facts that were implied for smaller datasets, were still observed to be true, i.e., FSBWO was still able to achieve the highest accuracy score in almost all the conducted experiments, and it was still selecting the least number of features in most scenarios, but again a similar tradeoff was observed as with medium datasets, but it was observed that the tradeoff further reduces to 0.4%, again prevalently an insignificant value, but along with that observation, there was another tradeoff introduced for the larger datasets. Although the proposed algorithm was achieving the best accuracy score of all, in a few cases a tradeoff of the cardinality of the subset was observed, i.e.,
494
H. Dutta et al.
Table 2. Classification accuracy (%) of all FS algorithms by four classifiers Dataset
FS method
DT
NB
KNN
LR
Titanic
Unoptimized
79.2
78
80.4
80.0
6
FSBWO
89..5
88.4
89.9
89.7
3
FSDE
87.7
85.3
88.3
87.9
4
FSGA
86.4
86.5
88.3
87.3
5
FSPSO
88.7
86.8
89.2
88.7
4
Pima
Ionosphere
QSAR
SPECTF
Sonar
Ozone Level
Musk
No. of features
FSSA
86.7
85.9
87.9
87.4
3
Unoptimized
74.2
75.4
73.5
77.1
8
FSBWO
84.9
85.6
83.1
87.1
3
FSDE
82.9
83.6
81.2
84.5
5
FSGA
83.2
83.5
81.8
84.5
5
FSPSO
83.5
84.2
82.7
86.3
5
FSSA
82.2
83.7
80.8
84.8
4
Unoptimized
88.3
88.7
84.5
89.0
34
FSBWO
98.5
98.5
95.4
98.2
9
FSDE
97.2
96.8
93.7
96.3
11
FSGA
97.3
96.5
94.5
96.3
14
FSPSO
98.0
97.2
95.1
97.6
13
FSSA
97.3
95.6
93.5
95.9
12
Unoptimized
81.7
68.0
84.6
86.8
41
FSBWO
90.0
78.8
91.9
93.3
11
FSDE
88.7
77.7
91.1
92.0
16
FSGA
88.6
77.2
90.3
91.9
17
FSPSO
88.7
78.0
91.2
93.0
20
FSSA
88.1
77.2
90.7
92.2
15
Unoptimized
81.1
71.3
73.3
81.6
44
FSBWO
95.9
87.6
89.9
94.0
13
FSDE
92.9
84.9
86.6
91.1
20
FSGA
93.6
84.1
87.3
91.4
15
FSPSO
95.1
86.0
87.3
92.6
17
FSSA
93.6
84.3
87.3
91.6
19
Unoptimized
71.5
68.8
85.0
77.0
60
FSBWO
91.9
92.1
98.3
95.2
17
FSDE
88.6
86.4
97.1
91.0
24
FSGA
87.6
86.0
96.0
91.0
22
FSPSO
89.8
88.3
97.9
93.1
21
FSSA
87.9
87.9
95.5
91.4
22
Unoptimized
92.8
68.9
93.8
94
72
FSBWO
95.2
75.9
96.0
96.3
19
FSDE
94.6
74.4
95.5
95.9
21
FSGA
94.8
94.9
95.6
95.8
24
FSPSO
94.9
75.2
95.7
96.2
23
FSSA
94.7
75.0
95.6
95.8
21
Unoptimized
94.5
83.8
96.8
94.9
166
FSBWO
96.6
87.1
98.2
96.6
44
FSDE
96.2
86.4
98.0
96.2
69
FSGA
96.2
86.4
98.0
96.2
62
FSPSO
96.4
86.9
98.1
96.4
87
FSSA
96.2
86.3
97.9
96.1
61
Feature Selection Using FSBWO Algorithm
495
the number of features selected by the proposed algorithm was showing to be higher than the competing algorithms. The value of tradeoff on average came to be around 3%. The achieved classification accuracy by the different approaches have been presented in Table 2. The number of features represent the average number of features over different experiments. This comparison becomes even more prevalent and observable in larger datasets as observed from the statistics in Table 2, and is significantly important as well since the larger the number of features the more the compute time observed. Further, the loss curves of different datasets over different classification algorithms have been shown in Fig. 2.
4
Conclusion
This paper proposed a new wrapper-based feature selection method based on the Black Widow Optimization algorithm, namely, FSBWO, the aim of which is to achieve the task of feature selection in minimum possible computation time and cost, while still improving the evaluation metric by a significant amount. Further, changes in the method for the selection step, that was proposed in the original work, were shown to have considerably improved the effectiveness of the algorithm. The proposed method was tested with different classification algorithms, that are in popular use, and the inferences and observations made from them were presented. Through rigorous experimentation, we conclude that the proposed method provides a significant improvement over the traditional in use feature selection methods, of similar nature. The limitations of the method were highlighted to be the tradeoff between classification accuracy and the cardinality of the selected subset, as the dimensionality of data grew, and we moved from the category of small datasets to the large datasets. Even so, the tradeoff was observed to be significantly small in terms of overall performance achieved.
References 1. Babatunde, O.H., et al.: A genetic Algorithm-Based feature selection. Br. J. Math. Comput. Sci. (2014) 2. Moradi, P., Gholampour, M.: A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl. Soft Comput. 43, 117–130 (2016) 3. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997) 4. Jain, A.K., Zongker, D.E.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19, 153–158 (1997) 5. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014) 6. van Laarhoven, P.J.M., Aarts, E.H.L.: Simulated Annealing: Theory and Applications. Mathematics and Its Applications, Springer, Dordrecht (1987). https://doi. org/10.1007/978-94-015-7744-1
496
H. Dutta et al.
7. Das, H., Naik, B., Behera, H.S.: Optimal selection of features using artificial electric field algorithm for classification. Arab. J. Sci. Eng. 46(9), 8355–8369 (2021). https://doi.org/10.1007/s13369-021-05486-x 8. Das, H., et al.: Optimal selection of features using teaching-learning-based optimization algorithm for classification (2020) 9. Das, H., et al.: A Jaya algorithm based wrapper method for optimal feature selection in supervised classification. J. King Saud Univ. - Comput. Inf. Sci. (2020) 10. Hayyolalam, V., Kazem, A.A.P.: Black widow optimization algorithm: a novel meta-heuristic approach for solving engineering optimization problems. Eng. Appl. Artif. Intell. 87, 103249 (2020)
Multi-objective Optimization for Complex Trajectory Tracking of 6-DOF Robotic Arm Manipulators Bivash Chakraborty1(B) , Rajarshi Mukhopadhyay2 , and Paramita Chattopadhyay2 1 School of Mechatronics and Robotics, Indian Institute of Engineering Science and
Technology, Shibpur, Howrah 711103, India [email protected] 2 Department of Electrical Engineering, Indian Institute of Engineering Science and Technology, Shibpur, Howrah 711103, India [email protected]
Abstract. Tuning Proportional-Integral-Derivative Computed Torque controllers (PID-CTC) for trajectory tracking control of Robotic Manipulators is a multiobjective optimization problem. Besides selecting a suitable optimization algorithm among various choices, forming an appropriate objective function is also essential. This work investigates the impact of different single and multi-objective optimization strategies for complex trajectory tracking control of a 6DOF robotic manipulators in simulation. By an attempt of optimizing the PID-CTC for two different complex trajectory tracking, this work initiates a multi-trajectory tracking optimization approach. Keywords: Robot manipulator control · PID-CTC · Multi-objective optimization · Genetic algorithm
1 Introduction Robotic Arm Manipulators, autonomously performing a task, have long been hailed as a game-changer for factory automation in industrial manufacturing since their introduction in the 1960s. However, in recent decades, their uses have expanded to include teleoperation [1], health care [2], space exploration [3], underwater exploration [4], agriculture [5], military applications, hazardous material handling [6], and more. Providing a solid backbone in achieving the “smart factory” motto of the ongoing Industry 4.0, the latest research in Robotic Manipulators has further ramped up into resource and energy efficiency to enhance sustainability, human-robot collaboration, and climate-related aspects of upcoming Industry 5.0. The main design challenges for an energy-efficient robot manipulator are: i) efficient trajectory planning and ii) optimal controller for trajectory tracking. The primary goal of the trajectory tracking control problem is to provide appropriate control actions to all joint actuators in order to force the manipulator’s end-effector to follow a time-bounded predefined path inside the robot workspace with high precision © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 A. K. Das et al. (Eds.): CIPR 2022, LNNS 480, pp. 497–510, 2022. https://doi.org/10.1007/978-981-19-3089-8_48
498
B. Chakraborty et al.
and desired dynamic properties (velocity/speed, acceleration, jerk, etc.) while adhering to various safety and other constraints. Because of their inherent MIMO, highly nonlinear and hardly coupled dynamics, serial-link, multi-DOF articulated (all revolute joints) manipulators pose a difficult challenge to the controller design problem. In addition, the design is further complicated by structural/parametric uncertainties (variations in inertia and other parameters, payload, etc.) and non-structural/non-parametric uncertainties (joint nonlinear damping and friction model sensor noises etc.) and different non-ideal characteristics of actuators (hysteresis, saturation, dead-zone). The inverse dynamics-based PD/PID-Computed Torque Control (CTC) [7] technique, in which two interconnected control loops work in sync, is the most common and widely used method for controlling robotic manipulators for complex trajectory tacking. The inner feedforward control loop, which is based on the inverse dynamics of the manipulator model, is a particular variation of the linearization technique for nonlinear controller design, and the outer feedback control loop is nothing more than a linear PD/PID controller providing the auxiliary control action essential for accurate tracking. This control scheme’s performance is determined by two factors: an accurate inverse dynamics model and proper tuning of the PD/PID gains. Tuning PID gains by nature is a multi-objective optimization problem with constraints because of a multitude of performance specifications. Traditional optimization approaches have great difficulty in satisfying a set of specifications and constraints required by real-control engineering problems. These specifications range from timedomain requirements (maximum overshoot, settling time, steady-state error, and raise time) to frequency-domain requirements (noise rejection or multiplicative uncertainty). Saturations and the maximum changes allowed for a control signal may also be considered as constraints. PID gain tuning for CTC of Robot Manipulators has a large body of research literature available, ranging from straightforward heuristic-based techniques through classical optimization techniques to the use of most recent bio-inspired metaheuristic optimization algorithms [8–10]. Although selecting the best optimization algorithm is critical for convergence time to the global best solution, defining a single or proper set of objective functions that correctly interprets all targets, specifications, and constraints is equally important. Much effort has gone into defining objective functions for tuning PID gains for CTC that focus on different error metrics, but for complex trajectories, this often leads to erratic control actions that are unsafe for the actuators and drains much power, which is undesired. Another vital feature of optimization-based PID-CTC tuning is that the solutions are trajectory-dependant. Even if they are optimized for one trajectory, complex trajectorybased tasks may not guarantee the same performance for another trajectory of the same task. The impact of different objective function formation strategies for single and multiobjective optimization frameworks for tuning PID-CTC using single and multi-objective Genetic Algorithm (GA) in simulation is investigated in this paper. A realistic 6DOF Robot model is used for the study, with accurate tracking of complex trajectories and minimal joint actuator torques as targets. Using the best approach determined during this research phase, an attempt is made to tune the PID-CTC gains for two complex trajectories simultaneously. This is how the rest of the paper is divided. A brief mathematical description of the PID-CTC framework is given in Sect. 2. The essence of the
Multi-objective Optimization for Complex Trajectory Tracking
499
GA is summarised in Sect. 3. Finally, Sect. 4 summarises the findings and analysis of the current study. Conclusions derived from the study are presented at the end, along with recommendations for future research.
2 PID-Computed Torque Control As illustrated in Fig. 1, the PID-CTC controller design problem can be divided into two parts: an inside nonlinear feedforward loop and exterior feedback auxiliary PID control action loop. This method employs a nonlinear dynamic model of the system to eliminate the manipulator’s nonlinearities, allowing for external PID control with fixed gains to achieve zero tracking error and global asymptotic stability. The equation of motion of the robotic arm manipulator using Euler-Lagrange formulation can be written as Eq. 1 [11]. (1) τ = M(θ)θ¨ + C θ, θ˙ θ˙ + G(θ) + F θ˙ where, M (θ ) ∈ Rn×n is called the inertia matrix, C θ, θ˙ ∈ Rn×n is called the Cori olis/centripetal matrix, G(θ ) ∈ Rn is the gravity vector, F θ˙ ∈ Rn is the frictional torque and τ ∈ Rn is required torque applied to each joint. Frictional torque is an unstructured dynamic in the system that is characterised as a mixture of Coulomb and viscous friction can be expressed as Eq. 2. ˙ (2) F θ˙ = Fν θ˙ + Fc sgn(θ) where Fν is the viscous friction coefficient and Fc is dynamic or Coulomb friction coefficient. The properties of the dynamic equation are as follows: • The inertia matrix M(θ) is a symmetric, positive definite matrix. ˙ • The Coriolis matrix C θ, θ˙ can be computed in such a way that M(θ) − 2C θ, θ˙ is a skew-symmetric matrix. • The gravity vector must be bounded such that, G(θ) ≤ gb . of joint vector θ . where gb is the scalar function, for revolute joint it is independent Equation 1 can be re-written by replacing H θ, θ˙ = C θ, θ˙ θ˙ + G(θ) as Eq. 3. (3) τ = M(θ)θ¨ + H θ, θ˙ + F θ˙ The three-term PID controller is chosen for the auxiliary control signal u(t) for PID-CTC. The proposed control signal is expressed as Eq. 4. u(t) = −kI − kP e − kD e˙ where, =
e. Therefore, the overall dynamic equation becomes as Eq. 5. τ = M(θ) θ¨ d + kI + kP e + kD e˙ + H θ, θ˙ + F θ˙
These KP , KI , KD values are tuned with different optimization objectives.
(4)
(5)
500
B. Chakraborty et al.
Fig. 1. Block schematic of PID-CTC
3 Genetic Algorithm In the late 1960s, John Holland [12] proposed the genetic algorithm (GA). It was based on the “survival of the fittest” principle, which was based on the natural phenomenon of biological evolution. Potential solutions are encoded into chromosomes at the beginning of a GA process, which starts with a random population. The performance of each chromosome, expressed as a fitness value, is determined by comparing it to the objective functions. Some chromosomes are then chosen based on their fitness value to form a pool upon which genetic operations such as crossover and mutation are performed to generate new solutions. The chromosomes in the pool swap genes to form new offspring through a crossover, and some genes are changed by a mutation, which is a random process. These two procedures continue to explore and exploit solutions in order for a GA to generate improved solutions based on available data and randomly permuted solutions. Following that, the newly created chromosomes are assessed, and fitness values are assigned. Fitter offspring will replace some of the older ones, resulting in a new population. This genetic cycle is repeated until certain conditions are met or the maximum number of generations has been reached. These applications can be classified as singleobjective or multiple-objective optimization depending on the nature of the problems. Single-objective optimization has only one objective function, whereas multi-objective optimization has several. A global solution is expected for single-objective optimization, but a set of global solutions should be expected for multi-objective optimization. The multi-objective genetic algorithm was proposed by Fonseca and Fleming (MOGA) [13–15]. It has three distinct features: a new ranking system, a new fitness assignment, and a new niche count.
Multi-objective Optimization for Complex Trajectory Tracking
501
4 Results and Analysis The current study uses a Simulink model of a 6-DOF JACO robotic arm manipulator from KINOVA Robotics [16]. From publicly available information in the research literature and official technical documents, every effort is made to create a faithful simulation model that closely mimics the dynamics of the actual JACO robotic arm. On top of the basic Simulink framework provided by MATLAB, this design uses the Peter Corke Toolbox [17] for robot modelling. The DH parameter matrix for the robot model is provided in Table 1 and limits on joint angles and torques are listed in Table 2, and the inertia parameters are collected from [18]. Table 1. DH parameter matrix Sl. no.
αi
ai
di
θi
Offset
1
π/2
0
0.2755
θ1
0
2
π
0.4100
0
θ2
−π /2
3
π /2
0
−0.0098
θ3
π /2
4
π /3
0
−0.2502
θ4
0
5
π/3
0
−0.0858
θ5
−π
6
π
0
−0.2116
θ6
π/1.8
Table 2. Joint and torque limits Joint number
Qmin (rad)
Qmax (rad)
Tmax (N-m)
1
−3.14
3.14
30.5
2
0.8722
5.4064
30.5
3
0.33136
5.9470
30.5
4
−3.14
3.14
6.8
5
−3.14
3.14
6.8
6
−3.14
3.14
6.8
502
B. Chakraborty et al.
For the entire duration of this study, two different trajectories are chosen within the manipulator’s workspace. Trajectories are illustrated in Fig. 2. Each trajectory is divided into four segments by two randomly selected start and endpoints and three randomly selected waypoints in between. To reduce jerk, each segment’s start and endpoints are assumed to have zero velocity and zero acceleration, and each segment is approximated in joint space with 500 points for a duration of 5 s (total 20 s & 2000 points) using a smooth quintic polynomial trajectory [19]. The optimization is carried out using the standard single and multi-objective Genetic Algorithm (GA) solvers available in the Global Optimization Toolbox [20] in MATLAB. The first part of the experiment deals with PID-CTC tuning for Trajectory 1 using four different objective function strategies. In the MATLAB-based single and multi-objective GA implementations, the population size is set to 50, the maximum number of iterations to 30, and all other hyperparameters are set to default values. The serach range of the PID gains for this PID-CTC controller are heursitically selected as follows: KP (30-500), KI (30-500) and KD (20-230).
Fig. 2. (a) Traj1 in 3D cartesian space. (b) Traj2 in 3D cartesian space
The details of the objective function strategies are given below from Eq. 6–Eq. 9.: Strategy 1: Single objective; Weighted Sum of ISEs of all joints and expressed as J1 (KP , KI , KD ) =
6 i=1
T
wi
|ei (t)|2 dt
(6)
0
where w1 to w5 all are kept as 1 and w6 is heuristically selected to be 1.2. Strategy 2: Two objectives; Weighted Sum of ISEs and squared sum of changes in control efforts of all joints. Objective functions are formulated as: T J1 (KP , KI , KD ) = 6i=1 wi 0 |ei (t)|2 dt (7) 2 J2 (KP , KI , KD ) = 6i=1 2000 j=1 |τi (j) − τi (j − 1)| where τi (0) = 0 for all i.
Multi-objective Optimization for Complex Trajectory Tracking
503
Strategy 3: Three objectives; Weighted Sum of ISEs, squared sum of changes in control efforts and the sum of absolute maximum errors (peak overshoot/undershoot) of all joints. Objective functions are characterized by: ⎧ ⎫ T ⎪ ⎪ J1 (KP , KI , KD ) = 6i=1 wi 0 |ei (t)|2 dt ⎪ ⎪ ⎨ ⎬ 6 2000 2 J2 (KP , KI , KD ) = i=1 j=1 |τi (j) − τi (j − 1)| (8) ⎪ ⎪ ⎪ ⎪ J3 (KP , KI , KD ) = 6i=1 max |ei (j)| ⎩ ⎭ 1